Genetics and Big Data


The overarching goal of this project is to identify a predictive, quantitative framework describing individual differences in genetic, epigenetic, cognitive, and behavioral markers of emotion-cognition regulation in response to academically stressful situations. Each year, large numbers of young adults drop out of college and university due to self-sabotaging and seemingly irrational behaviors when faced with academic stressors in their young adulthood. This proposal utilizes a cross-disciplinary approach to understanding neuro-biological functionalities and resultant behaviors across a spectrum of neuro-typical and neuro-atypical young adults, the latter being identified as those with diagnosed learning disabilities, such as dyslexia, ADHD, and college-able autism. This project-partnership includes faculty and students from the University of Vermont (sequencing data analyses), Landmark College (research subject recruitment), University of New Hampshire (research subject recruitment), University of Maine (model simulation), and Vermont Genetics Network. Dawei’s group has done some trial work at MGHPCC and has been VERY pleased with the results. He would like to scale up – currently to run one sample, he uses 2TB storage and 5 days of processing with 64GB memory and 12 cores. The planned project has 3,000 samples. To finish them, the storage will be 2TB X 3,000 = 6 PB. Computational time is estimated at 15,000 computing days (5 days X 3,000) using a single processor with 64GB and 12 cores.

Student Research Computing Facilitator Profile:

Recommend a graduate student with expertise in dealing with large data sets -- might be more enjoyable if they have an interest in biology but not required.

Project Status Complete
Project Mentor Katia Oleinik
Student Abigail Waters
Institution University of Vermont


Genetic Algorithms and Support Vector Machines in Forest Mapping A GIS based model for wildlife conservation in Sri Lanka