Molecular data / GWAS integration


Methods and data for performing a transcriptome-wide, regulome-wide (or any other *ome-wide) association study with GWAS data.

REF: Integrative approaches for large-scale transcriptome-wide association studies. Gusev et al. Nature Genetics. 2016

REF: Allelic imbalance of chromatin accessibility in cancer identifies candidate causal risk variants and their mechanisms. Grishin et al. Nature Genetics. 2022

CWAS: Cistrome-Wide Association Studies

A workflow for training predictive models of the epigenomic “cistrome” and testing for association with GWAS disease data.

REF:Genetic determinants of chromatin reveal prostate cancer risk mediated by context-dependent gene regulation. Baca et al. Nature Genetics. 2022

MESC: Mediated Expression Score Regression

Method for quantifying the fraction of disease heritability mediated by all QTL effects.

REF:Quantifying genetic effects on disease mediated by assayed gene expression levels. Yao et al. Nature Genetics. 2020

Interactive browser for TWAS results from hundreds of complex traits.

SCZ chromatin TWAS

Data and analysis of chromatin/expression/splicing and schizophrenia.

REF: Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Gusev et al. Nature Genetics. 2018

QTL discovery / fine-mapping


Method to identify cell-type specific QTL effects by leveraging allele-specific and total expression.

REF: DeCAF: A novel method to identify cell-type specific regulatory variants and their role in cancer risk. Kalita et al. Genome Biology 2022

PLASMA: PopuLation Allele-Specific MApping

Method for fine-mapping functional data using eQTL and allelic-imbalance signal.

REF:Allele-Specific QTL Fine Mapping with PLASMA. Wang et al. AJHG. 2020


Method for identifying context-specific allelic imbalance and building allele-specific predictors.

REF: Allelic imbalance reveals widespread germline-somatic regulatory differences and prioritizes risk loci in Renal Cell Carcinoma. pre-print

Clinical Outcomes / Prediction

SurvLatent ODE

A generative, Neural ODE based time-to-event model for longitudinal data with competing risks.

REF: SurvLatent ODE: A Neural ODE based time-to-event model with competing risks for longitudinal data improves cancer-associated Venous Thromboembolism (VTE) prediction. Moon et al. Proceedings of Machine Learning Research. 2022

SurvNODE: Neural ODEs for Multi-State Survival Analysis

Method for inferring survival trajectories across multiple states (e.g. illness/death) using neural Ordinary Differential Equations (ODEs).

REF:A General Framework for Survival Analysis and Multi-State Modelling. Groha et al. arXiv. 2021


A workflow for germline imputation from tumors with quality control, ancestry inference, and polygenic risk scoring.

REF:Constructing germline research cohorts from the discarded reads of clinical tumor sequences. Gusev et al. Genome Med. 2021

Population Genetics


Method for identifying identical-by-descent segments in large genomic data.

REF:Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations. Saada et al. Nature Communications. 2020


Method for detection of IBD shared haplotypes and association to trait. Infers haplotype clusters from IBD segments (for example, detected by the GREMLIN algorithm below), generating pseudo-SNP data for association testing.

REF: DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. The American Journal of Human Genetics. 2011



This code has been superseded by the FUSION software above. Legacy implementation archived here.

Methods for performing a Transcriptome-wide Association Study. Identify associations between genetic component of gene expression and trait using eQTL and GWAS data only.

REF: Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics. 2016


See GERMLINE2 above

Method for fast, pairwise detection of segments identical by descent. Uses hashing techniques to efficiently identify long stretches of shared DNA between pairs of individuals from array SNP data.

REF: Whole population, genome-wide mapping of hidden relatedness. Genome Research. 2009


Genotype phasing by entropy minimization.

REF: Highly scalable genotype phasing by entropy minimization. IEEE/ACM TCBB. 2008