Groha et al. arXiv. 2021
Genome-wide association studies (GWAS) can tell us where to look for genetic effects on disease, but not how these effects manifest themselves. Disentangling the underlying biological mechanisms poses the next great challenge for genetic analysis. At the same time, population measures of molecular phenotypes such as gene expression and chromatin activity are being collected at an unprecedented rate. We aim to develop statistical techniques for integrating molecular data to make sense of GWAS findings. Can we identify the disease associated genes and their regulators? Can we make concrete statements about causality? Can molecular data help us efficiently identify the specific causal mutations? Or prioritize targets for drug discovery? This work involves methods related to QTL analyses, genetic prediction, and making the most of summary-level GWAS data.
What regions of the genome are unusually important for a disease? Do features observed in specific cell-types or conditions tend to harbor trait-effecting variants, and can they inform our understanding of the trait etiology? Is the disease primarily driven by variants that disrupt coding, have subtle effects on regulation, or by as-of-yet unknown features? This work involves methods related to inference of heritability, variance component (or Gaussian Process) models, and polygenic risk prediction.
Genomic data from hundreds of thousands of individuals is already available and growing. Can we efficiently infer the relationships between individuals using only genetic data from massive cohorts? Can we then relate these relationships to phenotype or health records to inform our understanding of disease? Do certain subpopulations have unusual phenotype effects? To what extent are these differences driven by the demography of the population, environment, or selection? This work is at the interface of efficient computational methods, population genetics, and health record informatics.