Enhanced Integrated Gradients: improving interpretability of deep learning models using splicing codes as a case study

Despite the success and fast adaptation of deep learning models in biomedical domains, their lack of interpretability remains an issue. Here, we introduce Enhanced Integrated Gradients (EIG), a method to identify significant features associated with a specific prediction task. Using RNA splicing prediction as well as digit classification as case studies, we demonstrate that EIG improves upon the original Integrated Gradients method and produces sets of informative features. We then apply EIG to identify A1CF as a key regulator of liver-specific alternative splicing, supporting this finding with subsequent analysis of relevant A1CF functional (RNA-seq) and binding data (PAR-CLIP).

[1]  Chaolin Zhang,et al.  CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data , 2016, Bioinform..

[2]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[3]  Bridget E. Begg,et al.  Secondary motifs enable concentration-dependent regulation by Rbfox family proteins , 2019, bioRxiv.

[4]  Gene W. Yeo,et al.  Variation in alternative splicing across human tissues , 2004, Genome Biology.

[5]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[6]  Thomas M. Keane,et al.  Mouse genomic variation and its effect on phenotypes and gene regulation , 2011, Nature.

[7]  Brendan J. Frey,et al.  Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context , 2011, Bioinform..

[8]  Yi Xing,et al.  Rbfox Proteins Regulate Splicing as Part of a Large Multiprotein Complex LASR , 2016, Cell.

[9]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[10]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[11]  M. Rattray,et al.  Uncovering tissue-specific binding features from differential deep learning , 2020, Nucleic acids research.

[12]  Yoseph Barash,et al.  Integrative deep models for alternative splicing , 2017, bioRxiv.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Sameer Singh,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[15]  Douglas L. Black,et al.  Neuronal regulation of alternative pre-mRNA splicing , 2007, Nature Reviews Neuroscience.

[16]  Benjamin J. Blencowe,et al.  Alternative Splicing in the Mammalian Nervous System: Recent Insights into Mechanisms and Functional Roles , 2015, Neuron.

[17]  M. Stoffel,et al.  The RNA-Binding Protein A1CF Regulates Hepatic Fructose and Glycerol Metabolism via Alternative RNA Splicing. , 2019, Cell reports.

[18]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[19]  J. Nadeau,et al.  Apobec1 complementation factor (A1CF) and RBM47 interact in tissue-specific regulation of C to U RNA editing in mouse intestine and liver , 2018, RNA.

[20]  Ross Smith,et al.  Functional diversity of the hnRNPs: past, present and perspectives. , 2010, The Biochemical journal.

[21]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[22]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[23]  M. Ares,et al.  Context-dependent control of alternative splicing by RNA-binding proteins , 2014, Nature Reviews Genetics.

[24]  Juan González-Vallinas,et al.  A new view of transcriptome complexity and regulation through the lens of local splicing variations , 2016, eLife.

[25]  Bo Wang,et al.  Vicus: Exploiting local structures to improve network-based analysis of biological data , 2017, PLoS Comput. Biol..

[26]  Brendan J. Frey,et al.  Deciphering the splicing code , 2010, Nature.

[27]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[28]  Bernhard Schölkopf,et al.  A Local Learning Approach for Clustering , 2006, NIPS.

[29]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[30]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[31]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[32]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[33]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[34]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[35]  Ron Shamir,et al.  Accurate identification of alternatively spliced exons using support vector machine , 2005, Bioinform..

[36]  Eric L Van Nostrand,et al.  Sequence, Structure and Context Preferences of Human RNA Binding Proteins , 2017, bioRxiv.

[37]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[38]  F. Baralle,et al.  Alternative splicing as a regulator of development and tissue identity , 2017, Nature Reviews Molecular Cell Biology.

[39]  B. Blencowe,et al.  An RNA map predicting Nova-dependent splicing regulation , 2006, Nature.

[40]  Weijun Gao,et al.  AVISPA: a web tool for the prediction and analysis of alternative splicing , 2013, Genome Biology.

[41]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.