Machine Learning in Genomic Medicine: A Review of Computational Problems and Data Sets

In this paper, we provide an introduction to machine learning tasks that address important problems in genomic medicine. One of the goals of genomic medicine is to determine how variations in the DNA of individuals can affect the risk of different diseases, and to find causal explanations so that targeted therapies can be designed. Here we focus on how machine learning can help to model the relationship between DNA and the quantities of key molecules in the cell, with the premise that these quantities, which we refer to as cell variables, may be associated with disease risks. Modern biology allows high-throughput measurement of many such cell variables, including gene expression, splicing, and proteins binding to nucleic acids, which can all be treated as training targets for predictive models. With the growing availability of large-scale data sets and advanced computational techniques such as deep learning, researchers can help to usher in a new era of effective genomic medicine.

[1]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[2]  Quaid Morris,et al.  Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure. , 2010, RNA.

[3]  P. A. Futreal,et al.  Emerging patterns of somatic mutations in cancer , 2013, Nature Reviews Genetics.

[4]  Euan A Ashley,et al.  The precision medicine initiative: a new national effort. , 2015, JAMA.

[5]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[6]  Manolis Kellis,et al.  Discovery and Characterization of Chromatin States for Systematic Annotation of the Human Genome , 2011, RECOMB.

[7]  Michael D. Wilson,et al.  The Evolutionary Landscape of Alternative Splicing in Vertebrate Species , 2012, Science.

[8]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[9]  B. Bainbridge,et al.  Genetics , 1981, Experientia.

[10]  Lourdes Peña Castillo,et al.  Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins , 2009, Nature Biotechnology.

[11]  John Langford,et al.  A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[12]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[13]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[14]  Elias Campo Guerri,et al.  International network of cancer genome projects , 2010 .

[15]  S. Stamm,et al.  Alternative splicing and disease. , 2009, Biochimica et biophysica acta.

[16]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[17]  Andreas Ziegler,et al.  Risk estimation and risk prediction using machine-learning methods , 2012, Human Genetics.

[18]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[19]  Luke A. Gilbert,et al.  Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression , 2013, Cell.

[20]  Cynthia Rudin,et al.  Machine learning for science and society , 2013, Machine Learning.

[21]  Timothy R. Hughes,et al.  Sequence specificity is obtained from the majority of modular C2H2 zinc-finger arrays , 2011, Nucleic acids research.

[22]  Jorng-Tzong Horng,et al.  Characterization and prediction of mRNA polyadenylation sites in human genes , 2011, Medical & Biological Engineering & Computing.

[23]  Matthias W. Hentze,et al.  Focus Quality Control 3 0 end mRNA processing: molecular mechanisms and implications for health and disease , 2008 .

[24]  John G. Moffat,et al.  Phenotypic screening in cancer drug discovery — past, present and future , 2014, Nature Reviews Drug Discovery.

[25]  Yvan Saeys,et al.  Translation initiation site prediction on a genomic scale: beauty in simplicity , 2007, ISMB/ECCB.

[26]  James A. Cuff,et al.  Distinguishing protein-coding and noncoding genes in the human genome , 2007, Proceedings of the National Academy of Sciences.

[27]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[28]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[29]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[30]  M. C. Schatz,et al.  The DNA data deluge , 2013, IEEE Spectrum.

[31]  Eran Segal,et al.  A Feature-Based Approach to Modeling Protein–DNA Interactions , 2007, RECOMB.

[32]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[33]  C A Floudas,et al.  Computational methods in protein structure prediction. , 2007, Biotechnology and bioengineering.

[34]  M. Gerstein,et al.  Annotating non-coding regions of the genome , 2010, Nature Reviews Genetics.

[35]  Kiri Wagstaff,et al.  Machine Learning that Matters , 2012, ICML.

[36]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[37]  Adrian R. Krainer,et al.  Peripheral SMN restoration is essential for long-term rescue of a severe SMA mouse model , 2011, Nature.

[38]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[39]  Joel T Dudley,et al.  In silico research in the era of cloud computing , 2010, Nature Biotechnology.

[40]  Peter Delves,et al.  Encyclopedia of life sciences , 2009 .

[41]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  T. Nilsen,et al.  Expansion of the eukaryotic proteome by alternative splicing , 2010, Nature.

[43]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[44]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[45]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[46]  Jinrui Xu,et al.  Why human disease-associated residues appear as the wild-type in other species: genome-scale structural evidence for the compensation hypothesis. , 2014, Molecular biology and evolution.

[47]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[48]  D. Baker,et al.  High Resolution Mapping of Protein Sequence–Function Relationships , 2010, Nature Methods.

[49]  Siddharth S. Dey,et al.  Integrated genome and transcriptome sequencing from the same cell , 2014, Nature Biotechnology.

[50]  Brendan J. Frey,et al.  Deciphering the splicing code , 2010, Nature.

[51]  R. F. Luco,et al.  Epigenetics in Alternative Pre-mRNA Splicing , 2011, Cell.

[52]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[53]  Michael R. Green,et al.  Transcriptional regulatory elements in the human genome. , 2006, Annual review of genomics and human genetics.

[54]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[55]  Daniel E. Newburger,et al.  Diversity and Complexity in DNA Recognition by Transcription Factors , 2009, Science.

[56]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[57]  William Stafford Noble,et al.  Integrative annotation of chromatin elements from ENCODE data , 2012, Nucleic acids research.

[58]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[59]  M. Roizen,et al.  Hallmarks of Cancer: The Next Generation , 2012 .

[60]  Jan O. Korbel,et al.  Data analysis: Create a cloud commons , 2015, Nature.

[61]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[62]  M. Ritchie,et al.  Methods of integrating data to uncover genotype–phenotype interactions , 2015, Nature Reviews Genetics.

[63]  James E. DiCarlo,et al.  RNA-Guided Human Genome Engineering via Cas9 , 2013, Science.

[64]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[65]  K. White,et al.  ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis , 2011, BMC Genomics.

[66]  Kevin Y. Yip,et al.  Machine learning and genome annotation: a match meant to be? , 2013, Genome Biology.

[67]  N. Plana,et al.  Functional analysis of LDLR promoter and 5′ UTR mutations in subjects with clinical diagnosis of familial hypercholesterolemia , 2011, Human mutation.

[68]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[69]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[70]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[71]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[72]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[73]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  W. Bickmore,et al.  Human diseases with underlying defects in chromatin structure and modification. , 2001, Human molecular genetics.

[75]  Matthew Slattery,et al.  Absence of a simple code: how transcription factors read the genome. , 2014, Trends in biochemical sciences.

[76]  R. Shoemaker The NCI60 human tumour cell line anticancer drug screen , 2006, Nature Reviews Cancer.

[77]  Syed Abbas Bukhari,et al.  POLYAR, a new computer program for prediction of poly(A) sites in human sequences , 2010, BMC Genomics.

[78]  Wei Tang,et al.  Correction of a genetic disease in mouse via use of CRISPR-Cas9. , 2013, Cell stem cell.

[79]  Jun Li,et al.  TCPA: a resource for cancer functional proteomics data , 2013, Nature Methods.

[80]  Benjamin J. Blencowe,et al.  Dynamic Integration of Splicing within Gene Regulatory Pathways , 2013, Cell.

[81]  Ross M. Fraser,et al.  Genetic studies of body mass index yield new insights for obesity biology , 2015, Nature.

[82]  Brendan J. Frey,et al.  Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context , 2011, Bioinform..

[83]  L. Kruglyak,et al.  The role of regulatory variation in complex traits and disease , 2015, Nature Reviews Genetics.

[84]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[85]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[86]  Heidi Ledford,et al.  End of cancer-genome project prompts rethink , 2015, Nature.

[87]  Alexandre Reymond,et al.  Evolutionary Discrimination of Mammalian Conserved Non-Genic Sequences (CNGs) , 2003, Science.

[88]  Stephen J O'Brien,et al.  Accounting for multiple comparisons in a genome-wide association study (GWAS) , 2010, BMC Genomics.

[89]  P. ’. ‘t Hoen,et al.  Alternative mRNA transcription, processing, and translation: insights from RNA sequencing. , 2015, Trends in genetics : TIG.

[90]  Benjamin J. Raphael,et al.  Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes , 2011, Proceedings of the National Academy of Sciences.

[91]  Feng Zhang,et al.  In vivo interrogation of gene function in the mammalian brain using CRISPR-Cas9 , 2014, Nature Biotechnology.

[92]  Susan M. Jones,et al.  DREAMing of benchmarks , 2015, Nature Biotechnology.

[93]  K. Robertson DNA methylation and human disease , 2005, Nature Reviews Genetics.

[94]  C. Pace,et al.  Forces contributing to the conformational stability of proteins , 1996, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[95]  D. G. MacArthur,et al.  Guidelines for investigating causality of sequence variants in human disease , 2014, Nature.

[96]  P. Palange,et al.  From the authors , 2007, European Respiratory Journal.

[97]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[98]  Albert J. Vilella,et al.  A high-resolution map of human evolutionary constraint using 29 mammals , 2011, Nature.

[99]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[100]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[101]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[102]  Jonathan R. Karr,et al.  WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions , 2014, Database J. Biol. Databases Curation.

[103]  Timothy R. Hughes,et al.  High-throughput characterization of protein–RNA interactions , 2014, Briefings in functional genomics.

[104]  Rappold,et al.  Human Molecular Genetics , 1996, Nature Medicine.

[105]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[106]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1974, Nature.

[107]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[108]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[109]  Atina G. Coté,et al.  Evaluation of methods for modeling transcription factor sequence specificity , 2013, Nature Biotechnology.

[110]  W. B. Roberts,et al.  Machine Learning: The High Interest Credit Card of Technical Debt , 2014 .

[111]  E. Segal,et al.  In pursuit of design principles of regulatory sequences , 2014, Nature Reviews Genetics.

[112]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[113]  Gary D. Stormo,et al.  Program in Gene Function and Expression Publications and Presentations Program in Gene Function and Expression 4-2014 An improved predictive recognition model for Cys 2-His 2 zinc finger proteins , 2014 .

[114]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[115]  Yair Weiss,et al.  Approximate Inference and Protein-Folding , 2002, NIPS.

[116]  C. Dobson,et al.  The amyloid state and its association with protein misfolding diseases , 2014, Nature Reviews Molecular Cell Biology.

[117]  S. Gerstberger,et al.  A census of human RNA-binding proteins , 2014, Nature Reviews Genetics.

[118]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[119]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[120]  Gunnar Rätsch,et al.  Accurate splice site prediction using support vector machines , 2007, BMC Bioinformatics.

[121]  Guey-Shin Wang,et al.  Splicing in disease: disruption of the splicing code and the decoding machinery , 2007, Nature Reviews Genetics.

[122]  D. Bentley Coupling mRNA processing with transcription in time and space , 2014, Nature Reviews Genetics.

[123]  Greg Gibson,et al.  Rare and common variants: twenty arguments , 2012, Nature Reviews Genetics.

[124]  Eric Lonstein,et al.  Prize-based contests can provide solutions to computational biology problems , 2013, Nature Biotechnology.

[125]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..

[126]  Jun Kawai,et al.  Evolutionary turnover of mammalian transcription start sites. , 2006, Genome research.

[127]  L. Stein The case for cloud computing in genome informatics , 2010, Genome Biology.

[128]  John R Yates,et al.  The butterfly effect in cancer: A single base mutation can remodel the cell , 2015, Proceedings of the National Academy of Sciences.

[129]  Stephen Gould,et al.  Accelerated dual decomposition for MAP inference , 2010, ICML.

[130]  Ross M. Fraser,et al.  Defining the role of common variation in the genomic and biological architecture of adult human height , 2014, Nature Genetics.

[131]  Carl Kingsford,et al.  What are decision trees? , 2008, Nature Biotechnology.

[132]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[133]  Thomas C. Südhof,et al.  Cartography of neurexin alternative splicing mapped by single-molecule long-read mRNA sequencing , 2014, Proceedings of the National Academy of Sciences.

[134]  Robert J. Weatheritt,et al.  A Highly Conserved Program of Neuronal Microexons Is Misregulated in Autistic Brains , 2014, Cell.

[135]  Ben Lehner,et al.  Beyond genotype to phenotype: why the phenotype of an individual cannot always be predicted from their genome sequence and the environment that they experience , 2012, The FEBS journal.

[136]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[137]  Anirvan Ghosh,et al.  SMN2 splicing modifiers improve motor function and longevity in mice with spinal muscular atrophy , 2014, Science.

[138]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[139]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[140]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[141]  Manolis Kellis,et al.  Large-scale epigenome imputation improves data quality and disease variant enrichment , 2015, Nature Biotechnology.

[142]  E. Birney,et al.  Comparative genomics: genome-wide analysis in metazoan eukaryotes , 2003, Nature Reviews Genetics.

[143]  Alexandre V. Morozov,et al.  Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE , 2006, ISMB.

[144]  Hans Clevers,et al.  Functional repair of CFTR by CRISPR/Cas9 in intestinal stem cell organoids of cystic fibrosis patients. , 2013, Cell stem cell.

[145]  J. Lieb,et al.  ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. , 2004, Genomics.

[146]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[147]  Tamara S. Roman,et al.  New genetic loci link adipose and insulin biology to body fat distribution , 2014, Nature.

[148]  S. Brenner,et al.  General Nature of the Genetic Code for Proteins , 1961, Nature.

[149]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[150]  A. Krainer,et al.  Disruption of an SF2/ASF-dependent exonic splicing enhancer in SMN2 causes spinal muscular atrophy in the absence of SMN1 , 2002, Nature Genetics.

[151]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[152]  B Devlin,et al.  Response to ‘Predicting the diagnosis of autism spectrum disorder using gene pathway analysis' , 2014, Molecular Psychiatry.

[153]  Shengen Yan,et al.  Deep Image: Scaling up Image Recognition , 2015, ArXiv.

[154]  C. Burge,et al.  Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. , 2008, RNA.

[155]  C. Logie,et al.  Sequence-based prediction of single nucleosome positioning and genome-wide nucleosome occupancy , 2012, Proceedings of the National Academy of Sciences.

[156]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[157]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[158]  Jian Zhou,et al.  Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction , 2014, ICML.

[159]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[160]  L. Nissim,et al.  Multiplexed and programmable regulation of gene networks with an integrated RNA and CRISPR/Cas toolkit in human cells. , 2014, Molecular cell.

[161]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[162]  P. V. von Hippel,et al.  Diffusion-driven mechanisms of protein translocation on nucleic acids. 1. Models and theory. , 1981, Biochemistry.

[163]  M. Esteller,et al.  DNA methylation profiling in the clinic: applications and challenges , 2012, Nature Reviews Genetics.

[164]  P. Stenson,et al.  The Human Gene Mutation Database (HGMD) and Its Exploitation in the Fields of Personalized Genomics and Molecular Evolution , 2012, Current protocols in bioinformatics.

[165]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[166]  Brendan J. Frey,et al.  Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data , 2012, BMC Bioinformatics.

[167]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[168]  D. Herschlag RNA Chaperones and the RNA Folding Problem (*) , 1995, The Journal of Biological Chemistry.

[169]  R. Young,et al.  Transcriptional Regulation and Its Misregulation in Disease , 2013, Cell.

[170]  M. Berger,et al.  Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors , 2009, Nature Protocols.

[171]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[172]  Philippe Horvath,et al.  The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA , 2010, Nature.

[173]  Yair Weiss,et al.  MAP Estimation, Linear Programming and Belief Propagation with Convex Free Energies , 2007, UAI.

[174]  Anthony A. Philippakis,et al.  Design of Compact, Universal DNA Microarrays for Protein Binding Microarray Experiments , 2007, RECOMB.

[175]  R. Elkon,et al.  Alternative cleavage and polyadenylation: extent, regulation and function , 2013, Nature Reviews Genetics.

[176]  Philip Machanick,et al.  MEME-ChIP: motif analysis of large DNA datasets , 2011, Bioinform..

[177]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[178]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[179]  Howard Y. Chang,et al.  Understanding the transcriptome through RNA structure , 2011, Nature Reviews Genetics.

[180]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[181]  A. Krogh Two methods for improving performance of an HMM application for gene finding , 1997 .

[182]  T. Cooper,et al.  Pre-mRNA splicing in disease and therapeutics. , 2012, Trends in molecular medicine.

[183]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[184]  Gary D Bader,et al.  The Genetic Landscape of a Cell , 2010, Science.

[185]  Jun S. Liu,et al.  Extracting sequence features to predict protein–DNA interactions: a comparative study , 2008, Nucleic acids research.

[186]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[187]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[188]  Clifford A. Meyer,et al.  Sequence determinants of improved CRISPR sgRNA design , 2015, Genome research.

[189]  Donald Sharon,et al.  A single-molecule long-read survey of the human transcriptome , 2013, Nature Biotechnology.

[190]  P. Hoen,et al.  Alternative mRNA transcription, processing, and translation: insights from RNA sequencing , 2015 .

[191]  Andrew H. Beck,et al.  Systematic Analysis of Breast Cancer Morphology Uncovers Stromal Features Associated with Survival , 2011, Science Translational Medicine.

[192]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[193]  Randall J. Platt,et al.  Therapeutic genome editing: prospects and challenges , 2015, Nature Medicine.

[194]  Valerie Obenchain,et al.  Risk prediction using genome‐wide association studies , 2010, Genetic epidemiology.

[195]  R. Quatrano Genomics , 1998, Plant Cell.

[196]  B. Blencowe Alternative Splicing: New Insights from Global Analyses , 2006, Cell.

[197]  Le Cong,et al.  Multiplex Genome Engineering Using CRISPR/Cas Systems , 2013, Science.

[198]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[199]  Jonathan R. Karr,et al.  A Whole-Cell Computational Model Predicts Phenotype from Genotype , 2012, Cell.

[200]  Elisa de Stanchina,et al.  Determinants of exon 7 splicing in the spinal muscular atrophy genes, SMN1 and SMN2. , 2006, American journal of human genetics.

[201]  S. Horvath,et al.  Transcriptomic Analysis of Autistic Brain Reveals Convergent Molecular Pathology , 2011, Nature.

[202]  Vincent Lepetit,et al.  Learning Separable Filters , 2013, CVPR.

[203]  R. Guigó,et al.  Are splicing mutations the most frequent cause of hereditary disease? , 2005, FEBS letters.

[204]  Lynda Chin,et al.  Highly Recurrent TERT Promoter Mutations in Human Melanoma , 2013, Science.

[205]  Sue Fletcher,et al.  Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements , 2012, Cellular and Molecular Life Sciences.

[206]  C. Walsh,et al.  Somatic Mutation, Genomic Variation, and Neurological Disease , 2013, Science.

[207]  Neelroop Parikshak,et al.  RBFOX1 regulates both splicing and transcriptional networks in human neuronal development. , 2012, Human molecular genetics.

[208]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[209]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[210]  W. Gibson,et al.  Encyclopedia of Life Sciences eLS 2011 , 2011 .

[211]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[212]  Mark A. Rubin,et al.  Health: Make precision medicine work for cancer care , 2015, Nature.

[213]  Ben Lehner Genotype to phenotype: lessons from model organisms for human genetics , 2013, Nature Reviews Genetics.

[214]  T. Schlick,et al.  Computational approaches to RNA structure prediction, analysis, and design. , 2011, Current opinion in structural biology.

[215]  S. Gregory,et al.  Cleavage and polyadenylation specificity factor 1 (CPSF1) regulates alternative splicing of interleukin 7 receptor (IL7R) exon 6. , 2013, RNA.

[216]  Irene K. Moore,et al.  The DNA-encoded nucleosome organization of a eukaryotic genome , 2009, Nature.

[217]  E Skafidas,et al.  Predicting the diagnosis of autism spectrum disorder using gene pathway analysis , 2012, Molecular Psychiatry.

[218]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[219]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[220]  G. Stormo,et al.  Determining the specificity of protein–DNA interactions , 2010, Nature Reviews Genetics.

[221]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[222]  Kate B. Cook,et al.  Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity , 2014, Cell.

[223]  Tommi S. Jaakkola,et al.  Tightening LP Relaxations for MAP using Message Passing , 2008, UAI.

[224]  Brendan J. Frey,et al.  Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[225]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[226]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[227]  Morgan C. Giddings,et al.  Defining functional DNA elements in the human genome , 2014, Proceedings of the National Academy of Sciences.

[228]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[229]  E. Lander Initial impact of the sequencing of the human genome , 2011, Nature.

[230]  Mark Gerstein,et al.  Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing , 2013, G3: Genes, Genomes, Genetics.

[231]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[232]  M. Hentze,et al.  3′ end mRNA processing: molecular mechanisms and implications for health and disease , 2008, The EMBO journal.

[233]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[234]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[235]  Kai Zhang,et al.  A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding , 2014, Nature Genetics.

[236]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..