Machine learning applications for therapeutic tasks with genomics data

Summary Thanks to the increasing availability of genomics and other biomedical data, many machine learning algorithms have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine learning applications for genomics through the lens of therapeutic development. We investigate the interplay among genomics, compounds, proteins, electronic health records, cellular images, and clinical texts. We identify 22 machine learning in genomics applications that span the whole therapeutics pipeline, from discovering novel targets, personalizing medicine, developing gene-editing tools, all the way to facilitating clinical trials and post-market studies. We also pinpoint seven key challenges in this field with potentials for expansion and impact. This survey examines recent research at the intersection of machine learning, genomics, and therapeutic development.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  Harlan M Krumholz,et al.  Participation in cancer clinical trials: race-, sex-, and age-based disparities. , 2004, JAMA.

[3]  Srinivas Aluru,et al.  GRNUlar: Gene Regulatory Network reconstruction using Unrolled algorithm from Single Cell RNA-Sequencing data , 2020, bioRxiv.

[4]  Jimeng Sun,et al.  MolTrans: Molecular Interaction Transformer for drug–target interaction prediction , 2020, Bioinform..

[5]  Wen-Lian Hsu,et al.  NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition , 2006, BMC Bioinformatics.

[6]  Julio Licinio,et al.  From monoamines to genomic targets: a paradigm shift for drug discovery in depression , 2004, Nature Reviews Drug Discovery.

[7]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[8]  Ruibang Luo,et al.  A multi-task convolutional deep neural network for variant calling in single molecule sequencing , 2019, Nature Communications.

[9]  Joshua J. Levy,et al.  MethylNet: an automated and modular deep learning approach for DNA methylation analysis , 2020, BMC Bioinformatics.

[10]  Eric D. Kelsic,et al.  Challenges and opportunities of machine-guided capsid engineering for gene therapy , 2019, Cell and Gene Therapy Insights.

[11]  S. Ebrahim,et al.  'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? , 2003, International journal of epidemiology.

[12]  P. Vickers,et al.  Challenges and opportunities in the treatment of rare diseases , 2013 .

[13]  Ping Luo,et al.  Enhancing the prediction of disease-gene associations with multimodal deep learning , 2019, Bioinform..

[14]  M. Huss,et al.  A primer on deep learning in genomics , 2018, Nature Genetics.

[15]  Ash A. Alizadeh,et al.  Robust enumeration of cell subsets from tissue expression profiles , 2015, Nature Methods.

[16]  R. Guigó,et al.  Are splicing mutations the most frequent cause of hereditary disease? , 2005, FEBS letters.

[17]  G. Church,et al.  Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach , 2015, Nature Methods.

[18]  Xiao-Hui Zhang,et al.  Off-target Effects in CRISPR/Cas9-mediated Genome Engineering , 2015, Molecular therapy. Nucleic acids.

[19]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[20]  Patrik L. Ståhl,et al.  Visualization and analysis of gene expression in tissue sections by spatial transcriptomics , 2016, Science.

[21]  Alexander A. Morgan,et al.  Rutabaga by any other name: extracting biological names , 2002, J. Biomed. Informatics.

[22]  Jimeng Sun,et al.  Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics , 2021, ArXiv.

[23]  Douglas A Lauffenburger,et al.  Computational translation of genomic responses from experimental model systems to humans , 2019, PLoS Comput. Biol..

[24]  Noel E. O'Connor,et al.  Unsupervised label noise modeling and loss correction , 2019, ICML.

[25]  L. Ungar,et al.  MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine , 2016, Scientific Reports.

[26]  Yves Moreau,et al.  GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks , 2018, Bioinform..

[27]  A. Aliper,et al.  In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development , 2016, Nature Communications.

[28]  Fangfang Xia,et al.  Predicting tumor cell line response to drug pairs with deep learning , 2018, BMC Bioinformatics.

[29]  Fabian J Theis,et al.  Deep learning: new computational modelling techniques for genomics , 2019, Nature Reviews Genetics.

[30]  Tom R. Gaunt,et al.  Exploiting horizontal pleiotropy to search for causal pathways within a Mendelian randomization framework , 2020, Nature Communications.

[31]  Jure Leskovec,et al.  Strategies for Pre-training Graph Neural Networks , 2020, ICLR.

[32]  Lucy J. Colwell,et al.  Deep diversification of an AAV capsid protein by machine learning , 2021, Nature Biotechnology.

[33]  Abdelaali Hassaine,et al.  Untangling the complexity of multimorbidity with machine learning , 2020, Mechanisms of Ageing and Development.

[34]  Olaf Wolkenhauer,et al.  LoRAS: an oversampling approach for imbalanced datasets , 2019, Machine Learning.

[35]  Roded Sharan,et al.  Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients , 2021, Nature Cancer.

[36]  Base-resolution models of transcription factor binding reveal soft motif syntax , 2021, Nature genetics.

[37]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[38]  Jennifer F. Hughes,et al.  Conservation, acquisition, and functional impact of sex-biased gene expression in mammals , 2019, Science.

[39]  Denis C. Bauer,et al.  High Activity Target-Site Identification Using Phenotypic Independent CRISPR-Cas9 Core Functionality. , 2018, The CRISPR journal.

[40]  Roded Sharan,et al.  Using deep learning to model the hierarchical structure and function of a cell , 2018, Nature Methods.

[41]  Hongbin Zhong,et al.  Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers , 2019, Scientific Reports.

[42]  F. Collins,et al.  The path to personalized medicine. , 2010, The New England journal of medicine.

[43]  J. Millán,et al.  Gene Therapy Using Adeno‐Associated Virus Serotype 8 Encoding TNAP‐D10 Improves the Skeletal and Dentoalveolar Phenotypes in Alpl −/− Mice , 2021, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[44]  Le Yang,et al.  Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data , 2019, Bioinform..

[45]  Z. Werb,et al.  Tumors as organs: complex tissues that interface with the entire organism. , 2010, Developmental cell.

[46]  De-Shuang Huang,et al.  High-Order Convolutional Neural Network Architecture for Predicting DNA-Protein Binding Sites , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[47]  Jeremy B. R. Hayter,et al.  Utilising Graph Machine Learning within Drug Discovery and Development , 2020, ArXiv.

[48]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[49]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[50]  Jure Leskovec,et al.  GNNExplainer: Generating Explanations for Graph Neural Networks , 2019, NeurIPS.

[51]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[52]  J. Marioni,et al.  Computational principles and challenges in single-cell data integration , 2021, Nature Biotechnology.

[53]  B. Neale,et al.  Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases , 2018, Nature Genetics.

[54]  Matthew Willetts,et al.  Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK Biobank participants , 2017, Scientific Reports.

[55]  Jianxin Chen,et al.  Large-scale exploration and analysis of drug combinations , 2015, Bioinform..

[56]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[57]  Jure Leskovec,et al.  MARS: discovering novel cell types across heterogeneous single-cell experiments. , 2020, Nature methods.

[58]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[59]  Jimeng Sun,et al.  CONAN: Complementary Pattern Augmentation for Rare Disease Detection , 2019, AAAI.

[60]  Jin-Soo Kim,et al.  Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases , 2014, Bioinform..

[61]  Yoshua Bengio,et al.  Diet Networks: Thin Parameters for Fat Genomic , 2016, ICLR.

[62]  Ke Wang,et al.  A high-throughput SNP discovery strategy for RNA-seq data , 2019, BMC Genomics.

[63]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[64]  Erhan Bilal,et al.  Understanding the limits of animal models as predictors of human biology: lessons learned from the sbv IMPROVER Species Translation Challenge , 2014, Bioinform..

[65]  E. Pierson,et al.  An algorithmic approach to reducing unexplained pain disparities in underserved populations , 2021, Nature Medicine.

[66]  Gill Bejerano,et al.  A sequence-based, deep learning model accurately predicts RNA splicing branchpoints , 2017, bioRxiv.

[67]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[68]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[69]  Chloé-Agathe Azencott,et al.  Machine learning and genomics: precision medicine versus patient privacy , 2018, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[70]  Jimeng Sun,et al.  Highly elevated polygenic risk scores are better predictors of myocardial infarction risk early in life than later , 2021, Genome medicine.

[71]  Rajesh Ranganath,et al.  ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission , 2019, ArXiv.

[72]  K. Robertson DNA methylation and human disease , 2005, Nature Reviews Genetics.

[73]  Raymond Fok,et al.  Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance , 2020, CHI.

[74]  Jimeng Sun,et al.  COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching , 2020, KDD.

[75]  Brendan J. Frey,et al.  Machine Learning in Genomic Medicine: A Review of Computational Problems and Data Sets , 2016, Proceedings of the IEEE.

[76]  Ruibang Luo,et al.  Exploring the limit of using a deep neural network on pileup data for germline variant calling , 2020, Nature Machine Intelligence.

[77]  M. Wittrock Learning as a Generative Process , 1974 .

[78]  Luca Oneto,et al.  Fairness in Machine Learning , 2020, INNSBDDL.

[79]  Joshua J. Levy,et al.  PyMethylProcess - convenient high-throughput preprocessing workflow for DNA methylation data , 2019, Bioinform..

[80]  Chunlin Xiao,et al.  An open resource for accurately benchmarking small variant and reference calls , 2019, Nature Biotechnology.

[81]  Thawfeek M. Varusai,et al.  The Reactome Pathway Knowledgebase , 2017, Nucleic acids research.

[82]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[83]  Olga G Troyanskaya,et al.  An integrative tissue-network approach to identify and test human disease genes , 2018, Nature Biotechnology.

[84]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[85]  Zhao Li,et al.  Anticancer drug synergy prediction in understudied tissues using transfer learning , 2020, J. Am. Medical Informatics Assoc..

[86]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[87]  Juan J Cáceres,et al.  Disease gene prediction for molecularly uncharacterized diseases , 2019, PLoS Comput. Biol..

[88]  C. Skinner,et al.  Conceptual Model for Accrual to Cancer Clinical Trials. , 2019, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[89]  J. Keith Joung,et al.  High frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells , 2013, Nature Biotechnology.

[90]  Sungroh Yoon,et al.  Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity , 2018, Nature Biotechnology.

[91]  Jacqueline Corrigan-Curay,et al.  Real-World Evidence and Real-World Data for Evaluating Drug Safety and Effectiveness. , 2018, JAMA.

[92]  G. de los Campos,et al.  Can Deep Learning Improve Genomic Prediction of Complex Human Traits? , 2018, Genetics.

[93]  Christopher Y. Park,et al.  Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk , 2019, Nature Genetics.

[94]  Gill Bejerano,et al.  S-CAP extends pathogenicity prediction to genetic variants that affect RNA splicing , 2019, Nature Genetics.

[95]  Jimeng Sun,et al.  DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment Prediction , 2020, WWW.

[96]  M. Ghert,et al.  Lost in translation: animal models and clinical trials in cancer treatment. , 2014, American journal of translational research.

[97]  Tom R. Gaunt,et al.  Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome , 2017, bioRxiv.

[98]  Kok Siong Ang,et al.  A benchmark of batch-effect correction methods for single-cell RNA sequencing data , 2020, Genome Biology.

[99]  Jure Leskovec,et al.  Handling Missing Data with Graph Representation Learning , 2020, NeurIPS.

[100]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[101]  Peter Szolovits,et al.  What’s in a Note? Unpacking Predictive Value in Clinical Note Representations , 2018, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[102]  Emma J. Chory,et al.  A Deep Learning Approach to Antibiotic Discovery , 2020, Cell.

[103]  R. Wall,et al.  A mechanism for RNA splicing. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[104]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[105]  Eli J. Fine,et al.  DNA targeting specificity of RNA-guided Cas9 nucleases , 2013, Nature Biotechnology.

[106]  Ludvig Bergenstråhle,et al.  Integrating spatial gene expression and breast tumour morphology via deep learning , 2020, Nature Biomedical Engineering.

[107]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[108]  A. Schambach,et al.  Hematopoietic stem-cell senescence and myocardial repair - Coronary artery disease genotype/phenotype analysis of post-MI myocardial regeneration response induced by CABG/CD133+ bone marrow hematopoietic stem cell treatment in RCT PERFECT Phase 3 , 2020, EBioMedicine.

[109]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[110]  José Alquicira-Hernandez,et al.  Benchmarking of cell type deconvolution pipelines for transcriptomics data , 2020, Nature Communications.

[111]  Paul M. Thompson,et al.  Brain Imaging Genomics: Integrated Analysis and Machine Learning , 2020, Proceedings of the IEEE.

[112]  K. Rawlik,et al.  An atlas of genetic associations in UK Biobank , 2017, Nature Genetics.

[113]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[114]  Feng Liu,et al.  PEDLA: predicting enhancers with a deep learning-based algorithmic framework , 2016, Scientific Reports.

[115]  J. Kent,et al.  Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR , 2016, Genome Biology.

[116]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[117]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[118]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[119]  Nuno A. Fonseca,et al.  Pathway and network analysis of more than 2500 whole cancer genomes , 2020, Nature Communications.

[120]  T. Spector,et al.  Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements , 2013, Genome Biology.

[121]  Hua Xu,et al.  Predict effective drug combination by deep belief network and ontology fingerprints , 2018, J. Biomed. Informatics.

[122]  Peter Minary,et al.  crisprSQL: a novel database platform for CRISPR/Cas off-target cleavage assays , 2020, Nucleic Acids Res..

[123]  Howard Y. Chang,et al.  Genome regulation by long noncoding RNAs. , 2012, Annual review of biochemistry.

[124]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[125]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[126]  W. Hahn,et al.  Biologically informed deep neural network for prostate cancer classification and discovery , 2020, bioRxiv.

[127]  Avanti Shrikumar,et al.  Base-resolution models of transcription factor binding reveal soft motif syntax , 2019, Nature Genetics.

[128]  Tom Sercu,et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2021, Proceedings of the National Academy of Sciences.

[129]  Jean-Philippe Vert,et al.  TIGRESS: Trustful Inference of Gene REgulation using Stability Selection , 2012, BMC Systems Biology.

[130]  Wei Zheng,et al.  Drug–drug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths , 2017, Bioinform..

[131]  Allison P. Heath,et al.  The NCI Genomic Data Commons , 2021, Nature Genetics.

[132]  Le Cong,et al.  Multiplex Genome Engineering Using CRISPR/Cas Systems , 2013, Science.

[133]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[134]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[135]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[136]  Charles E. Vejnar,et al.  CRISPRscan: designing highly efficient sgRNAs for CRISPR/Cas9 targeting in vivo , 2015, Nature Methods.

[137]  O. Delaneau,et al.  Supplementary Information for ‘ Improved whole chromosome phasing for disease and population genetic studies ’ , 2012 .

[138]  Ka-Chun Wong,et al.  Off-target predictions in CRISPR-Cas9 gene editing using deep learning , 2018, Bioinform..

[139]  Sijian Wang,et al.  SPARSE INTEGRATIVE CLUSTERING OF MULTIPLE OMICS DATA SETS. , 2013, The annals of applied statistics.

[140]  Guido Sanguinetti,et al.  Melissa: Bayesian clustering and imputation of single-cell methylomes , 2019, Genome Biology.

[141]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[142]  Wei Wang,et al.  Predicting the Human Epigenome from DNA Motifs , 2014, Nature Methods.

[143]  Mark R. Trusheim,et al.  Stratified medicine: strategic and economic implications of combining drugs and clinical biomarkers , 2007, Nature Reviews Drug Discovery.

[144]  Benjamin Haibe-Kains,et al.  Dr.VAE: improving drug response prediction via modeling of drug perturbation effects , 2019, Bioinform..

[145]  Peng Qiu,et al.  COSMID: A Web-based Tool for Identifying and Validating CRISPR/Cas Off-target Sites , 2014, Molecular therapy. Nucleic acids.

[146]  Marie-Francine Moens,et al.  A survey on the application of recurrent neural networks to statistical language modeling , 2015, Comput. Speech Lang..

[147]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[148]  Andreas Bender,et al.  DeepSynergy: predicting anti-cancer drug synergy with Deep Learning , 2017, Bioinform..

[149]  Yoseph Barash,et al.  Integrative deep models for alternative splicing , 2017, bioRxiv.

[150]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[151]  Kexin Huang,et al.  An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text , 2020, ArXiv.

[152]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[153]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[154]  Amir K. Foroushani,et al.  Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen , 2019, Nature Communications.

[155]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.

[156]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[157]  Stan J. J. Brouns,et al.  Evolution and classification of the CRISPR–Cas systems , 2011, Nature Reviews Microbiology.

[158]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[159]  K. Propert,et al.  Immune responses to adenovirus and adeno-associated virus in humans , 1999, Gene Therapy.

[160]  A. Khera,et al.  Mendelian Randomization. , 2017, JAMA.

[161]  Mehmet Tan,et al.  Drug response prediction by ensemble learning and drug-induced gene expression signatures , 2018, Genomics.

[162]  Nigel Collier,et al.  Learning Orthographic Features in Bi-directional LSTM for Biomedical Named Entity Recognition , 2016, BioTxtM@COLING 2016.

[163]  O. Stegle,et al.  DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning , 2016, Genome Biology.

[164]  Thomas C. Wiegers,et al.  A CTD–Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug–disease and drug–phenotype interactions , 2013, Database J. Biol. Databases Curation.

[165]  E. Scarano,et al.  DNA Methylation , 1973, Nature.

[166]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[167]  M. Boutros,et al.  E-CRISP: fast CRISPR target site identification , 2014, Nature Methods.

[168]  M. Boguski,et al.  Functional genomics: it's all how you read it. , 1997, Science.

[169]  Stefan Bonn,et al.  Deep learning–based cell composition analysis from tissue expression profiles , 2020, Science Advances.

[170]  M. Snyder,et al.  High-throughput sequencing technologies. , 2015, Molecular cell.

[171]  Jimeng Sun,et al.  CORE: Automatic Molecule Optimization Using Copy & Refine Strategy , 2019, AAAI.

[172]  Chandra L. Theesfeld,et al.  Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk , 2018, Nature Genetics.

[173]  Bo Wang,et al.  Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities , 2018, Inf. Fusion.

[174]  Guohui Chuai,et al.  DeepCRISPR: optimized CRISPR guide RNA design by deep learning , 2018, Genome Biology.

[175]  Jason A. Papin,et al.  Reconciled rat and human metabolic networks for comparative toxicogenomics and biomarker predictions , 2017, Nature Communications.

[176]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[177]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[178]  Casey S Greene,et al.  MultiPLIER: A Transfer Learning Framework for Transcriptomics Reveals Systemic Features of Rare Disease. , 2019, Cell systems.

[179]  Wei Q. Deng,et al.  A machine-learning heuristic to improve gene score prediction of polygenic traits , 2017, Scientific Reports.

[180]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[181]  Paul M. Thompson,et al.  Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016) , 2016 .

[182]  Sridhar Ramaswamy,et al.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells , 2012, Nucleic Acids Res..

[183]  Valerie Obenchain,et al.  Risk prediction using genome‐wide association studies , 2010, Genetic epidemiology.

[184]  Scott D Ramsey,et al.  A national cancer clinical trials system for the 21st century: reinvigorating the NCI Cooperative Group Program. , 2010, Journal of the National Cancer Institute.

[185]  Mike Tyers,et al.  Prediction of Synergism from Chemical-Genetic Interactions by Machine Learning. , 2015, Cell systems.

[186]  Ryan T. Leenay,et al.  Large dataset enables prediction of repair after CRISPR–Cas9 editing in primary T cells , 2019, Nature Biotechnology.

[187]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[188]  Tero Aittokallio,et al.  Drug response prediction by inferring pathway-response associations with kernelized Bayesian matrix factorization , 2016, Bioinform..

[189]  Xiaoping Li,et al.  A Survey on Sparse Learning Models for Feature Selection , 2020, IEEE Transactions on Cybernetics.

[190]  Alicia R. Martin,et al.  Hidden ‘risk’ in polygenic scores: clinical use today could exacerbate health disparities , 2018, bioRxiv.

[191]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[192]  Dexter Hadley,et al.  Systematic integration of biomedical knowledge prioritizes drugs for repurposing , 2017, bioRxiv.

[193]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[194]  Zhong Ren,et al.  Annotating pathogenic non-coding variants in genic regions , 2017, Nature Communications.

[195]  Thomas Colthurst,et al.  A universal SNP and small-indel variant caller using deep neural networks , 2018, Nature Biotechnology.

[196]  K. Williams,et al.  Effect of long-term exposure to lower low-density lipoprotein cholesterol beginning early in life on the risk of coronary heart disease: a Mendelian randomization analysis. , 2012, Journal of the American College of Cardiology.

[197]  Jun Wang,et al.  Predicting Anticancer Drug Responses Using a Dual-Layer Integrated Cell Line-Drug Network Model , 2015, PLoS Comput. Biol..

[198]  Kenneth I. Berns,et al.  Gene Therapy Using Adeno-Associated Virus Vectors , 2008, Clinical Microbiology Reviews.

[200]  Hongfei Lin,et al.  Drug drug interaction extraction from biomedical literature using syntax convolutional neural network , 2016, Bioinform..

[201]  Jianzhu Ma,et al.  Predicting Drug Response and Synergy Using a Deep Learning Model of Human Cancer Cells. , 2020, Cancer cell.

[202]  Tero Aittokallio,et al.  Machine learning and feature selection for drug response prediction in precision oncology applications , 2018, Biophysical Reviews.

[203]  Zhen Cao,et al.  Simple tricks of convolutional neural network architectures improve DNA-protein binding prediction , 2018, Bioinform..

[204]  Yves Moreau,et al.  Candidate gene prioritization with Endeavour , 2016, Nucleic Acids Res..

[205]  Aaron C. Courville,et al.  Generative adversarial networks , 2020 .

[206]  Francisco Avila Cobos,et al.  Computational deconvolution of transcriptomics data from mixed cell populations , 2018, Bioinform..

[207]  Yijia Zhang,et al.  A hybrid model based on neural networks for biomedical relation extraction , 2018, J. Biomed. Informatics.

[208]  Benjamin S. Glicksberg,et al.  Identification of type 2 diabetes subgroups through topological analysis of patient similarity , 2015, Science Translational Medicine.

[209]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[210]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[211]  Jennifer Listgarten,et al.  Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs , 2018, Nature Biomedical Engineering.

[212]  Joseph Bergenstråhle,et al.  Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography , 2020, Communications Biology.

[213]  Joseph Bergenstråhle,et al.  Super-resolved spatial transcriptomics by deep data fusion , 2020, Nature Biotechnology.

[214]  J. L. Mateo,et al.  Refined sgRNA efficacy prediction improves large- and small-scale CRISPR–Cas9 applications , 2017, Nucleic acids research.

[215]  Jonathan R. Karr,et al.  A Whole-Cell Computational Model Predicts Phenotype from Genotype , 2012, Cell.

[216]  Z. Yakhini,et al.  Spatial transcriptomics inferred from pathology whole-slide images links tumor heterogeneity to survival in breast and lung cancer , 2020, Scientific reports.

[217]  Lei Deng,et al.  DrugCombDB: a comprehensive database of drug combinations toward the discovery of combinatorial therapy , 2019, Nucleic Acids Res..

[218]  Kristopher T. Jensen,et al.  Chromatin accessibility and guide sequence secondary structure affect CRISPR‐Cas9 gene editing efficiency , 2017, FEBS letters.

[219]  Qiang Yang,et al.  Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..

[220]  Umit Topaloglu,et al.  Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes , 2019, JAMIA open.

[221]  S. Waqar Jaffry,et al.  Information extraction from scientific articles: a survey , 2018, Scientometrics.

[222]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[223]  Jannik N. Andersen,et al.  Cancer genomics: from discovery science to personalized medicine , 2011, Nature Medicine.

[224]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[225]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[226]  Renaud Gaujoux,et al.  Found In Translation: a machine learning model for mouse-to-human inference , 2018, Nature Methods.

[227]  Alexander Schönhuth,et al.  Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype , 2019, Bioinform..

[228]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[229]  Francisco M. Couto,et al.  Extracting microRNA-gene relations from biomedical literature using distant supervision , 2017, PloS one.

[230]  Theodore Sakellaropoulos,et al.  The species translation challenge—A systems biology perspective on human and rat bronchial epithelial cells , 2014, Scientific Data.

[231]  Brian Hie,et al.  Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design. , 2020, Cell systems.

[232]  Alexander Tuzhilin,et al.  The long tail of recommender systems and how to leverage it , 2008, RecSys '08.

[233]  Antonio Pertusa,et al.  Learning Eligibility in Cancer Clinical Trials using Deep Neural Networks , 2018, Applied Sciences.

[234]  Christopher D. Manning,et al.  Graph Convolution over Pruned Dependency Trees Improves Relation Extraction , 2018, EMNLP.

[235]  Jing Su,et al.  DSTG: deconvoluting spatial transcriptomics data through graph-based artificial intelligence , 2020, bioRxiv.

[236]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[237]  Klaus-Robert Müller,et al.  Machine learning analysis of DNA methylation profiles distinguishes primary lung squamous cell carcinomas from head and neck metastases , 2019, Science Translational Medicine.

[238]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[239]  Xiaolin Li,et al.  GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text , 2017, Bioinform..