Integrating molecular phenotypes to decipher disease mechanisms.

Genome-wide association studies (GWAS) can tell us where to look for genetic effects on disease, but not how these effects manifest themselves. Disentangling the underlying biological mechanisms poses the next great challenge for large-scale genetics. At the same time, population measures of molecular phenotypes such as gene expression and chromatin activity are being collected at an unprecedented rate. We aim to develop statistical techniques for integrating molecular data to make sense of GWAS findings. Can we identify the disease associated genes and their regulators? Can we make concrete statements about causality? Can molecular data help us efficiently identify the specific causal mutations? Or prioritize targets for drug discovery? This work involves methods related to QTL analyses, genetic prediction, and making the most of summary-level GWAS data.

Selected papers:

Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. bioRxiv. 2016

29 Integrating gene expression with summary association statistics to identify susceptibility genes for 30 complex traits. The American Journal of Human Genetics. 2017

28 Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics. 2016

Inferring trait architecture at genome scale.

What regions of the genome are unusually important for a disease? Do features observed in specific cell-types or conditions tend to harbor trait-effecting variants, and can they inform our understanding of the trait etiology? Is the disease primarily driven by variants that disrupt coding, have subtle effects on regulation, or by as-of-yet unknown features? This work involves methods related to inference of heritability, variance component (or Gaussian Process) models, and polygenic risk prediction.

Selected papers:

27 Atlas of prostate cancer heritability in European and African-American men pinpoints tissue-specific regulation. Nature communications. 2016

15 Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. The American Journal of Human Genetics. 2014

Quantifying recent relatedness in massive cohorts.

Genomic data from hundreds of thousands of individuals is already available and growing. Can we efficiently infer the relationships between individuals using only genetic data from massive cohorts? Can we then relate these relationships to phenotype or health records to inform our understanding of disease? Do certain subpopulations have unusual phenotype effects? To what extent are these differences driven by the demography of the population, environment, or selection? This work is at the interface of efficient computational methods, population genetics, and health record informatics.

Selected papers:

02 Whole population, genome-wide mapping of hidden relatedness. Genome res. 2009

05 DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. The American Journal of Human Genetics. 2011