Global Importance Analysis: A Method to Quantify Importance of Genomic Features in Deep Neural Networks

Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. For model interpretability, attribution methods have been employed to reveal learned patterns that resemble sequence motifs. First-order attribution methods only quantify the independent importance of single nucleotide variants in a given sequence – it does not provide the effect size of motifs (or their interactions with other patterns) on model predictions. Here we introduce global importance analysis (GIA), a new model interpretability method that quantifies the population-level effect size that putative patterns have on model predictions. GIA provides an avenue to quantitatively test hypotheses of putative patterns and their interactions with other patterns, as well as map out specific functions the network has learned. As a case study, we demonstrate the utility of GIA on the computational task of predicting RNA-protein interactions from sequence. We first introduce a new convolutional network, we call ResidualBind, and benchmark its performance against previous methods on RNAcompete data. Using GIA, we then demonstrate that in addition to sequence motifs, ResidualBind learns a model that considers the number of motifs, their spacing, and sequence context, such as RNA secondary structure and GC-bias.

[1]  David R. Kelley,et al.  Sequential regulatory activity prediction across chromosomes with convolutional neural networks. , 2018, Genome research.

[2]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[3]  Sean R. Eddy,et al.  Inferring Sequence-Structure Preferences of RNA-Binding Proteins with Convolutional Residual Networks , 2018, bioRxiv.

[4]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[5]  Matt Ploenzke,et al.  Deep learning for inferring transcription factor binding sites. , 2020, Current opinion in systems biology.

[6]  David G. Knowles,et al.  Predicting Splicing from Primary Sequence with Deep Learning , 2019, Cell.

[7]  Benny Chor,et al.  A deep neural network approach for learning intrinsic protein‐RNA binding preferences , 2018, Bioinform..

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Matt Ploenzke,et al.  Improving representations of genomic sequence motifs in convolutional networks with exponential activations , 2020, Nature Machine Intelligence.

[10]  Quaid Morris,et al.  RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins , 2010, PLoS Comput. Biol..

[11]  Brendan J. Frey,et al.  cDeepbind: A context sensitive deep learning model of RNA-protein binding , 2018, bioRxiv.

[12]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[13]  Michael Q. Zhang,et al.  Design and bioinformatics analysis of genome-wide CLIP experiments , 2015, Nucleic acids research.

[14]  Lourdes Peña Castillo,et al.  Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins , 2009, Nature Biotechnology.

[15]  Luke Zappia,et al.  Opportunities and challenges in long-read sequencing data analysis , 2020, Genome Biology.

[16]  Justin B. Kinney,et al.  Logomaker: beautiful sequence logos in Python , 2019, bioRxiv.

[17]  Charles E. McAnany,et al.  Deep learning at base-resolution reveals motif syntax of the cis-regulatory code , 2019, bioRxiv.

[18]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[19]  P. Sharp,et al.  RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. , 2014, Molecular cell.

[20]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[22]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[23]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[24]  Gene W. Yeo,et al.  Robust transcriptome-wide discovery of RNA binding protein binding sites with enhanced CLIP (eCLIP) , 2016, Nature Methods.

[25]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[26]  Alexandre V. Morozov,et al.  Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE , 2006, ISMB.

[27]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[28]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[29]  Alexander G. B. Grønning,et al.  DeepCLIP: predicting the effect of mutations on protein–RNA binding with deep learning , 2019, bioRxiv.

[30]  Peter K. Koo,et al.  Robust Neural Networks are More Interpretable for Genomics , 2019, bioRxiv.

[31]  R. Backofen,et al.  GraphProt: modeling binding preferences of RNA-binding proteins , 2014, Genome Biology.

[32]  Bonnie Berger,et al.  RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data , 2016, Bioinform..

[33]  Gene W. Yeo,et al.  Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges , 2013, Nature Structural &Molecular Biology.

[34]  Peng Cui,et al.  Dynamic regulation of genome-wide pre-mRNA splicing and stress tolerance by the Sm-like protein LSm5 in Arabidopsis , 2014, Genome Biology.

[35]  Uwe Ohler,et al.  Deep neural networks for interpreting RNA-binding protein target preferences , 2019, bioRxiv.

[36]  Xintao Wei,et al.  Resources for the comprehensive discovery of functional RNA elements , 2015, bioRxiv.

[37]  J. Keene,et al.  Advancing the functional utility of PAR-CLIP by quantifying background binding to mRNAs and lncRNAs , 2014, Genome Biology.

[38]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[39]  Polly M. Fordyce,et al.  Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding , 2017, Proceedings of the National Academy of Sciences.

[40]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[41]  Michael E. Harris,et al.  Hidden specificity in an apparently non-specific RNA-binding protein , 2013, Nature.

[42]  Alexander Cloninger,et al.  Provable approximation properties for deep neural networks , 2015, ArXiv.

[43]  D. Black,et al.  Molecular basis of RNA recognition by the human alternative splicing factor Fox‐1 , 2006, The EMBO journal.

[44]  Gabriele Varani,et al.  RNA is rarely at a loss for companions; as soon as RNA , 2008 .

[45]  Tzvi Aviv,et al.  The NMR and X-ray structures of the Saccharomyces cerevisiae Vts1 SAM domain define a surface for the recognition of RNA hairpins. , 2006, Journal of molecular biology.

[46]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[47]  Yang Liu,et al.  Integrating thermodynamic and sequence contexts improves protein-RNA binding prediction , 2019, PLoS Comput. Biol..

[48]  Hong-Bin Shen,et al.  RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach , 2016, BMC Bioinformatics.

[49]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[50]  Atina G. Coté,et al.  Evaluation of methods for modeling transcription factor sequence specificity , 2013, Nature Biotechnology.

[51]  Tzvi Aviv,et al.  Sequence-specific recognition of RNA hairpins by the SAM domain of Vts1p , 2006, Nature Structural &Molecular Biology.

[52]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[53]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[54]  Tyson A. Clark,et al.  HITS-CLIP yields genome-wide insights into brain alternative RNA processing , 2008, Nature.

[55]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[56]  DeepCLIP: predicting the effect of mutations on protein–RNA binding with deep learning , 2020, Nucleic acids research.

[57]  Anupama Jha,et al.  Enhanced Integrated Gradients: improving interpretability of deep learning models using splicing codes as a case study , 2020, Genome Biology.

[58]  Gene W. Yeo,et al.  Advances and challenges in the detection of transcriptome‐wide protein–RNA interactions , 2017, Wiley interdisciplinary reviews. RNA.

[59]  Sean R. Eddy,et al.  Representation learning of genomic sequence motifs with convolutional neural networks , 2018, bioRxiv.

[60]  J. Ule,et al.  iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution , 2010, Nature Structural &Molecular Biology.