Biological interpretation of deep neural network for phenotype prediction based on gene expression

Background The use of predictive gene signatures to assist clinical decision is becoming more and more important. Deep learning has a huge potential in the prediction of phenotype from gene expression profiles. However, neural networks are viewed as black boxes, where accurate predictions are provided without any explanation. The requirements for these models to become interpretable are increasing, especially in the medical field. Results We focus on explaining the predictions of a deep neural network model built from gene expression data. The most important neurons and genes influencing the predictions are identified and linked to biological knowledge. Our experiments on cancer prediction show that: (1) deep learning approach outperforms classical machine learning methods on large training sets; (2) our approach produces interpretations more coherent with biology than the state-of-the-art based approaches; (3) we can provide a comprehensive explanation of the predictions for biologists and physicians. Conclusion We propose an original approach for biological interpretation of deep learning models for phenotype prediction from gene expression data. Since the model can find relationships between the phenotype and gene expression, we may assume that there is a link between the identified genes and the phenotype. The interpretation can, therefore, lead to new biological hypotheses to be investigated by biologists.

[1]  Navdeep S. Chandel,et al.  Fundamentals of cancer metabolism , 2016, Science Advances.

[2]  B. Arun,et al.  Cancers Associated with Brca1 and Brca2 Mutations Other than Breast and Ovarian Cancers Associated with Brca1 and Brca2 Mutations Other than Breast and Ovarian , 2022 .

[3]  Alexei Vazquez,et al.  The genetics of the p53 pathway, apoptosis and cancer therapy , 2008, Nature Reviews Drug Discovery.

[4]  Reza Ghaeini,et al.  A Deep Learning Approach for Cancer Detection and Relevant Gene Identification , 2017, PSB.

[5]  Sune Lehmann,et al.  Measure of Node Similarity in Multilayer Networks , 2016, PloS one.

[6]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[7]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[8]  E. Golemis,et al.  Interdependence of cell attachment and cell cycle signaling. , 2006, Current opinion in cell biology.

[9]  Xiang-hui Han,et al.  Adipocytokines and breast cancer. , 2018, Current problems in cancer.

[10]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[11]  Aurora Torrente,et al.  Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression , 2016, PloS one.

[12]  Eunhee Kim,et al.  RNA splicing factors as oncoproteins and tumour suppressors , 2016, Nature Reviews Cancer.

[13]  Markus H. Gross,et al.  Gradient-Based Attribution Methods , 2019, Explainable AI.

[14]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[15]  Guesh Dagnew,et al.  Deep learning approach for microarray cancer data classification , 2020, CAAI Trans. Intell. Technol..

[16]  Su-In Lee,et al.  DeepProfile: Deep learning of cancer molecular profiles for precision medicine , 2018, bioRxiv.

[17]  Sujoy Ghosh,et al.  Leptin and cancer: Pathogenesis and modulation , 2012, Indian journal of endocrinology and metabolism.

[18]  Rui Camacho,et al.  Learning influential genes on cancer gene expression data with stacked denoising autoencoders , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[19]  Alfonso Valencia,et al.  Mutated genes, pathways and processes in tumours , 2010, EMBO reports.

[20]  Alun D. Preece,et al.  Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[21]  翼 峰松,et al.  5分で分かる!? 有名論文ナナメ読み:Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K-R. and Samek, W. : On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2020 .

[22]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[23]  Xueyang Feng,et al.  DeepMetabolism: A Deep Learning System to Predict Phenotype from Genome Sequencing , 2017, bioRxiv.

[24]  Blaise Hanczar,et al.  Phenotypes Prediction from Gene Expression Data with Deep Multilayer Perceptron and Unsupervised Pre-training , 2018 .

[25]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[26]  John Calvin Reed,et al.  Tumor suppressor p53 is a direct transcriptional activator of the human bax gene , 1995, Cell.

[27]  Jean Decety,et al.  The Curious Relation between Theory of Mind and Sharing in Preschool Age Children , 2015, PloS one.

[28]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[29]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[30]  T. Helleday,et al.  PCNA on the crossroad of cancer. , 2009, Biochemical Society transactions.

[31]  Joanna Kalucka,et al.  Metabolic control of the cell cycle , 2015, Cell cycle.

[32]  A. Ciechanover,et al.  Ubiquitin‐mediated proteolysis: biological regulation via destruction , 2000, BioEssays : news and reviews in molecular, cellular and developmental biology.

[33]  Kenji Ishimoto,et al.  The Role of PPARs in Cancer , 2008, PPAR research.

[34]  Casey S. Greene,et al.  Extracting a Biologically Relevant Latent Space from Cancer Transcriptomes with Variational Autoencoders , 2017, bioRxiv.

[35]  S. Salghetti,et al.  Destruction of Myc by ubiquitin‐mediated proteolysis: cancer‐associated and transforming mutations stabilize Myc , 1999, The EMBO journal.

[36]  Wojciech Samek,et al.  Toward Interpretable Machine Learning: Transparent Deep Neural Networks and Beyond , 2020, ArXiv.

[37]  Klaus-Robert Müller,et al.  Layer-Wise Relevance Propagation: An Overview , 2019, Explainable AI.

[38]  Michael Mayo,et al.  A survey of neural network-based cancer prediction models from microarray data , 2019, Artif. Intell. Medicine.

[39]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[40]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[41]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[42]  Mu Li,et al.  The homologous recombination protein RAD51 is a promising therapeutic target for cervical carcinoma , 2017, Oncology reports.

[43]  M. Evans,et al.  Cellular repair of oxidatively induced DNA base lesions is defective in prostate cancer cell lines, PC-3 and DU-145. , 2004, Carcinogenesis.

[44]  S. Alahari,et al.  Cell matrix adhesions in cancer: The proteins that form the glue , 2017, Oncotarget.

[45]  George Thomas,et al.  Ribosome biogenesis in cancer: new players and therapeutic avenues , 2017, Nature Reviews Cancer.

[46]  Martin Ester,et al.  Deep Genomic Signature for early metastasis prediction in prostate cancer , 2018, bioRxiv.

[47]  J. Cheng,et al.  Amplification and overexpression of the AKT2 oncogene in a subset of human pancreatic ductal adenocarcinomas , 1998, Molecular carcinogenesis.