TiSAn: estimating tissue-specific effects of coding and non-coding variants

Bioinformatics. 2018 Sep 15;34(18):3061-3068. doi: 10.1093/bioinformatics/bty301.

Abstract

Motivation: Model-based estimates of general deleteriousness, like CADD, DANN or PolyPhen, have become indispensable tools in the interpretation of genetic variants. However, these approaches say little about the tissues in which the effects of deleterious variants will be most meaningful. Tissue-specific annotations have been recently inferred for dozens of tissues/cell types from large collections of cross-tissue epigenomic data, and have demonstrated sensitivity in predicting affected tissues in complex traits. It remains unclear, however, whether including additional genome-scale data specific to the tissue of interest would appreciably improve functional annotations.

Results: Herein, we introduce TiSAn, a tool that integrates multiple genome-scale data sources, defined by expert knowledge. TiSAn uses machine learning to discriminate variants relevant to a tissue from those with no bearing on the function of that tissue. Predictions are made genome-wide, and can be used to contextualize and filter variants of interest in whole genome sequencing or genome-wide association studies. We demonstrate the accuracy and flexibility of TiSAn by producing predictive models for human heart and brain, and detecting tissue-relevant variations in large cohorts for autism spectrum disorder (TiSAn-brain) and coronary artery disease (TiSAn-heart). We find the multiomics TiSAn model is better able to prioritize genetic variants according to their tissue-specific action than the current state-of-the-art method, GenoSkyLine.

Availability and implementation: Software and vignettes are available at http://github.com/kevinVervier/TiSAn.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Autism Spectrum Disorder / genetics
  • Genome-Wide Association Study / methods
  • Heart Diseases / genetics
  • Humans
  • Machine Learning
  • Software*