gusevlab | Thoughts on our chromatin TWAS paper out in Nature Genetics

Our paper titled “Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights” is now in print (see related discussion of the 2016 pre-print here).

The aim of the study was to move beyond genome-wide association studies – which implicate regions of the genome – to shortlisting the specific biological mechanisms that can explain individual genetic associations. Since the majority of common disease heritability appears to be non-coding, we specifically focused on identifying (1) genes whose expression is associated with disease through genetics, and (2) regulatory elements (e.g. enhancers) whose chromatin activity is associated with those genes. We accomplished this by building genetic models of gene expression for a given gene in a given tissue (aka TWAS models) and predicted the expression into (1) a large schizophrenia GAWS study to identify potential “susceptibility” genes, and (2) multiple smaller studies of chromatin activity in healthy individuals to identify potential “susceptibility” regulators. We showed that this approach can identify a putative target gene for approximately half of the known schizophrenia GWAS loci, as well as a target regulatory element for approximately half of those loci with target genes. We incorporated lots of external data to show that these predicted gene-regulator and gene-disease associations were supported by chromosome conformation (Hi-C), expression activity in the developing brain, statistical colocalization, and independent GWAS data. For one of these target genes, our fantastic collaborators in the Katsanis lab showed a consistent neurophenotypic effect in zebrafish, motivating further experimental work in this complicated locus. In general, we believe this is a useful framework for identifying GWAS target genes and their regulators with the kind of population-scale epigenomics data that is now becoming broadly available.

While these results are very promising, many challenges still remain in making sense of non-coding variation and disease:

# The paper presents a lot of statistical validation, but what we really want to know is how well the predicted regulatory-gene relationships validate experimentally. How often does perturbation of this regulatory element alter the expression of the predicted target gene? How does this performance compare to other prioritization approaches such as Hi-C, colocalization, inter-tissue correlation, nearest gene, or just good old “manual inspection”? These questions may be answered by emerging experimental techniques such as crisprQTL (Gasperini et al.), which allow for interrogation of thousands of regulatory elements. A key advantage of our approach is that it is able to make a large number of predictions efficiently and so is very amenable to high-throughput validation. Whereas traditional, one locus at a time, validation efforts are probably better served by cherry picking screaming QTL signals, the high-throughput approach is appealing because it would allow us to infer global patterns of causality.

# We were quite surprised to see no great enrichment of significant TWAS genes discovered in pre-frontal cortex relative to other tissues like blood and fat. We did find that the pre-frontal cortex models captured the majority of polygenic signal, but this is much more likely to be confounded by pleiotropic effects on expression and disease. The relationship between tissue-specific expression and disease is becoming clearer (e.g. Finucane et al. Nat Genet 2018) but how best to use this knowledge to maximize identification and validation of target genes remains an open question. This is especially relevant in light of recent work showing that expression of the same gene in multiple tissues allows for the identification of more disease targets (Barbeira et al.).

# We were constrained by the much larger gene expression sample sizes to perform TWAS from the expression studies, but optimally we would build joint models of genetic association between multiple molecular and disease phenotypes simultaneously. Particularly interesting methodological developments in this space are the approaches of Roytman et al. PLoS Genet 2018 for molecular fine-mapping, and Giambartolomei et al. Bioinformatics 2018 for GWAS colocalization.

See software/data for downloadable genome-wide results from this study.