Deep learning of genomic variation and regulatory network data.

The human genome is now investigated through high-throughput functional assays, and through the generation of population genomic data. These advances support the identification of functional genetic variants and the prediction of traits (e.g. deleterious variants and disease). This review summarizes lessons learned from the large-scale analyses of genome and exome data sets, modeling of population data and machine-learning strategies to solve complex genomic sequence regions. The review also portrays the rapid adoption of artificial intelligence/deep neural networks in genomics; in particular, deep learning approaches are well suited to model the complex dependencies in the regulatory landscape of the genome, and to provide predictors for genetic variant calling and interpretation.

[1]  Pratyoosh Shukla,et al.  Computational tools for modern vaccine development , 2019, Human vaccines & immunotherapeutics.

[2]  J. Venter,et al.  Functional characterization of 3D protein structures informed by human genetic diversity , 2019, Proceedings of the National Academy of Sciences.

[3]  Guan Ning Lin,et al.  De novo Mutations From Whole Exome Sequencing in Neurodevelopmental and Psychiatric Disorders: From Discovery to Application , 2019, Front. Genet..

[4]  Dominik Heider,et al.  Encodings and models for antimicrobial peptide classification for multi-resistant pathogens , 2019, BioData Mining.

[5]  Bing Ren,et al.  The human noncoding genome defined by genetic diversity , 2018, Nature Genetics.

[6]  Elizabeth Brunk,et al.  Mapping genetic variations to three-dimensional protein structures to enhance variant interpretation: a proposed framework , 2017, Genome Medicine.

[7]  Leopold Parts,et al.  Computational biology: deep learning , 2017, Emerging topics in life sciences.

[8]  David Heckerman,et al.  Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes , 2017, American journal of human genetics.

[9]  Eun Yong Kang,et al.  Identification of individuals by trait prediction using whole-genome sequencing data , 2017, Proceedings of the National Academy of Sciences.

[10]  Minh Duc Cao,et al.  Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning , 2017, bioRxiv.

[11]  William H. Majoros,et al.  Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics , 2017, PloS one.

[12]  E. Kirkness,et al.  Fast and accurate HLA typing from short-read next-generation sequence data with xHLA , 2017, Proceedings of the National Academy of Sciences.

[13]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[14]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[15]  David P. Nusinow,et al.  Estimating the Selective Effects of Heterozygous Protein Truncating Variants from Human Exome Data , 2017, Nature Genetics.

[16]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[17]  Wei Q. Deng,et al.  A machine-learning heuristic to improve gene score prediction of polygenic traits , 2017, Scientific Reports.

[18]  Hon-Cheong So,et al.  Improving polygenic risk prediction from summary statistics by an empirical Bayes approach , 2017, Scientific Reports.

[19]  Jianxing Feng,et al.  Imputation for transcription factor binding predictions based on deep learning , 2017, PLoS Comput. Biol..

[20]  Cory Y. McLean,et al.  Creating a universal SNP and small indel variant caller with deep neural networks , 2016, bioRxiv.

[21]  May D. Wang,et al.  DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins , 2016, bioRxiv.

[22]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[23]  Giorgio Valentini,et al.  A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease. , 2016, American journal of human genetics.

[24]  A. Siepel,et al.  Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data , 2016, Nature Genetics.

[25]  Levi C. T. Pierce,et al.  Deep sequencing of 10,000 human genomes , 2016, Proceedings of the National Academy of Sciences.

[26]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[27]  Tomáš Vinař,et al.  DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads , 2016, PloS one.

[28]  Rachel L. Goldfeder,et al.  Medical implications of technical accuracy in genome sequencing , 2016, Genome Medicine.

[29]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[30]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[31]  Kuldip K. Paliwal,et al.  A Short Review of Deep Learning Neural Networks in Protein Structure Prediction Problems , 2015 .

[32]  Guusje Bonnema,et al.  Making the difference: integrating structural variation detection tools , 2015, Briefings Bioinform..

[33]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[34]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[35]  Caleb F. Davis,et al.  Assessing structural variation in a personal genome—towards a human reference diploid genome , 2015, BMC Genomics.

[36]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[37]  Kevin Y. Yip,et al.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer , 2014, Genome Biology.

[38]  Heng Li,et al.  Toward better understanding of artifacts in variant calling from high-coverage samples , 2014, Bioinform..

[39]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[40]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[41]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[42]  S. Rosset,et al.  lobSTR: A short tandem repeat profiler for personal genomes , 2012, RECOMB.

[43]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[44]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[45]  Melissa C. Greven,et al.  An integrated encyclopedia of DNA elements in the human genome , 2014 .