Learning & Visualizing Genomic Signatures of Cancer Tumors using Deep Neural Networks

Deep learning for medical diagnosis using genomics is extremely challenging given the high dimensionality of the data and lack of sufficient patient samples. Another challenge is that deep models are conceived as black boxes without much interpretation on how these complex models make predictions. We propose a deep transfer learning framework for cancer diagnosis with the capability of learning the sequence of DNA and RNA in cancer cells and identifying genetic changes that alter cell behavior and cause uncontrollable growth and malignancy. We design a new Convolutional Neural Network architecture with capabilities of learning the genomic signatures of whole-transcriptome gene expressions collected from multiple tumor types covering multiple organ sites. We demonstrate how our trained model can function as a comprehensive multi-tissue cancer classifier by using transfer learning to build classifiers for tumors lacking sufficient human samples to be trained independently. We introduce visualization procedures to provide more biological insight on how our model is learning genomic signatures and accurately making predictions across multiple cancer tissue types.

[1]  W. E. Gye,et al.  CANCER RESEARCH , 1923, British medical journal.

[2]  Yann LeCun,et al.  Learning Invariant Feature Hierarchies , 2012, ECCV Workshops.

[3]  Qing Wang,et al.  Gene Expression Classification of Lung Adenocarcinoma into Molecular Subtypes , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Yong Xu,et al.  RPCA-Based Tumor Classification Using Gene Expression Data , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Shuigeng Zhou,et al.  A New Approach for Feature Selection from Microarray Data Based on Mutual Information , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Kimberly R. Kukurba,et al.  RNA Sequencing and Analysis. , 2015, Cold Spring Harbor protocols.

[12]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Jan Bogaerts,et al.  Designing transformative clinical trials in the cancer genome era. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[16]  K.R Kavitha,et al.  PCA-based gene selection for cancer classification , 2018, 2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Cheng Liu,et al.  Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[20]  M. Plummer,et al.  International agency for research on cancer. , 2020, Archives of pathology.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Huijuan Lu,et al.  A Hybrid Ensemble Algorithm Combining AdaBoost and Genetic Algorithm for Cancer Classification with Gene Expression Data , 2018, 2018 9th International Conference on Information Technology in Medicine and Education (ITME).

[24]  Jieping Ye,et al.  A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[26]  Habibollah Haron,et al.  Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[29]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[30]  Simone A. Ludwig,et al.  Analyzing gene expression data: Fuzzy decision tree algorithm applied to the classification of cancer data , 2015, 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[31]  F. Yusof,et al.  Classification of miRNA Expression Data Using Random Forests for Cancer Diagnosis , 2016, 2016 International Conference on Computer and Communication Engineering (ICCCE).

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[34]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[35]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[37]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[38]  T. Hudson,et al.  The Genetic Basis for Cancer Treatment Decisions , 2012, Cell.

[39]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[40]  Jinmao Wei,et al.  Local-Nearest-Neighbors-Based Feature Weighting for Gene Selection , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  Wen Yuan,et al.  MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[43]  Richard Piper,et al.  An overview of gradient descent optimization algorithms , 2016 .

[44]  Yong Qi,et al.  A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[45]  M. Buck,et al.  Obesity and Ovarian Cancer Survival: A Systematic Review and Meta-analysis , 2012, Cancer Prevention Research.

[46]  Qi Zhu,et al.  A Class-Information-Based Sparse Component Analysis Method to Identify Differentially Expressed Genes on RNA-Seq Data , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[47]  Peter W. Laird,et al.  Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer , 2018, Cell.

[48]  Peng Wu,et al.  Classification of a DNA Microarray for Diagnosing Cancer Using a Complex Network Based Method , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[49]  B. Stewart,et al.  World cancer report 2014. , 2014 .

[50]  Edward R. Dougherty,et al.  Detecting Multivariate Gene Interactions in RNA-Seq Data Using Optimal Bayesian Classification , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[51]  Laura E. MacConaill,et al.  Existing and emerging technologies for tumor genomic profiling. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.