Neural Architecture Search for Joint Optimization of Predictive Power and Biological Knowledge

We report a neural architecture search framework, BioNAS, that is tailored for biomedical researchers to easily build, evaluate, and uncover novel knowledge from interpretable deep learning models. The introduction of knowledge dissimilarity functions in BioNAS enables the joint optimization of predictive power and biological knowledge through searching architectures in a model space. By optimizing the consistency with existing knowledge, we demonstrate that BioNAS optimal models reveal novel knowledge in both simulated data and in real data of functional genomics. BioNAS provides a useful tool for domain experts to inject their prior belief into automated machine learning and therefore making deep learning easily accessible to practitioners. BioNAS is available at this https URL.

[1]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[2]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[3]  Yingnian Wu,et al.  Deep-learning augmented RNA-seq analysis of transcript splicing , 2019, Nature Methods.

[4]  Rafael A. Irizarry,et al.  Interpretable Convolution Methods for Learning Genomic Sequence Motifs , 2018, bioRxiv.

[5]  W. Wong,et al.  CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Gene W. Yeo,et al.  Robust transcriptome-wide discovery of RNA binding protein binding sites with enhanced CLIP (eCLIP) , 2016, Nature Methods.

[7]  J. Michael Cherry,et al.  The Encyclopedia of DNA elements (ENCODE): data portal update , 2017, Nucleic Acids Res..

[8]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[9]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[10]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[11]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[12]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[13]  Benjamin J. Raphael,et al.  Visible Machine Learning for Biomedicine , 2018, Cell.

[14]  Christopher Y. Park,et al.  Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk , 2019, Nature Genetics.

[15]  P. D’haeseleer What are DNA sequence motifs? , 2006, Nature Biotechnology.

[16]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[17]  Fabian J Theis,et al.  Deep learning: new computational modelling techniques for genomics , 2019, Nature Reviews Genetics.

[18]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[19]  M. Ares,et al.  Context-dependent control of alternative splicing by RNA-binding proteins , 2014, Nature Reviews Genetics.

[20]  Kirthevasan Kandasamy,et al.  Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..

[21]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[23]  Quanshi Zhang,et al.  Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Avanti Shrikumar,et al.  Separable Fully Connected Layers Improve Deep Learning Models For Genomics , 2017, bioRxiv.

[25]  Yi Xing,et al.  CLIP-seq analysis of multi-mapped reads discovers novel functional RNA regulatory sites in the human transcriptome , 2017, Nucleic acids research.

[26]  David R. Kelley,et al.  Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015 .