Multi-Tissue Cancer Classification of Gene Expressions using Deep Learning

Cancer classification using gene expressions is extremely challenging given the complexity and high dimensionality of the data. Current classification methods typically rely on samples collected from a single tissue type and perform a prerequisite of gene feature selection to avoid processing the full set of genes. These methods fall short in taking advantage of genome-wide Next Generation Sequencing technologies that provide a snapshot of the whole transcriptome rather than a predetermined subset of genes. We propose a deep learning framework for cancer diagnosis by developing a multi-tissue cancer classifier based on whole-transcriptome gene expressions collected from multiple tumor types. We introduce a new Convolutional Neural Network architecture specifically designed to address the complex nature of whole-transcriptome gene expressions with capabilities of detecting genetic alterations driving cancer progression by learning genomic signatures across multiple tissue types without requiring the prerequisite of gene feature selection. Our model achieves 98.9% classification accuracy on human samples representing 33 different cancer tumor types across 26 organ sites.

[1]  Edward R. Dougherty,et al.  Detecting Multivariate Gene Interactions in RNA-Seq Data Using Optimal Bayesian Classification , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Khalid Auhmani,et al.  Gene-expression-based cancer classification through feature selection with KNN and SVM classifiers , 2015, 2015 Intelligent Systems and Computer Vision (ISCV).

[3]  Laura E. MacConaill,et al.  Existing and emerging technologies for tumor genomic profiling. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  Yann LeCun,et al.  Learning Invariant Feature Hierarchies , 2012, ECCV Workshops.

[5]  Qi Zhu,et al.  A Class-Information-Based Sparse Component Analysis Method to Identify Differentially Expressed Genes on RNA-Seq Data , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  W. E. Gye,et al.  CANCER RESEARCH , 1923, British medical journal.

[7]  Peter W. Laird,et al.  Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer , 2018, Cell.

[8]  Peng Wu,et al.  Classification of a DNA Microarray for Diagnosing Cancer Using a Complex Network Based Method , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jing Xu,et al.  A Novel Deep Flexible Neural Forest Model for Classification of Cancer Subtypes Based on Gene Expression Data , 2019, IEEE Access.

[11]  M. Plummer,et al.  International agency for research on cancer. , 2020, Archives of pathology.

[12]  Cheng Liu,et al.  Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Richard Piper,et al.  An overview of gradient descent optimization algorithms , 2016 .

[14]  Yong Qi,et al.  A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  M. Buck,et al.  Obesity and Ovarian Cancer Survival: A Systematic Review and Meta-analysis , 2012, Cancer Prevention Research.

[16]  Jinmao Wei,et al.  Local-Nearest-Neighbors-Based Feature Weighting for Gene Selection , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Wen Yuan,et al.  MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Jieping Ye,et al.  A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[20]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[21]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Qing Wang,et al.  Gene Expression Classification of Lung Adenocarcinoma into Molecular Subtypes , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Shuigeng Zhou,et al.  A New Approach for Feature Selection from Microarray Data Based on Mutual Information , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[26]  Yong Xu,et al.  RPCA-Based Tumor Classification Using Gene Expression Data , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  B. Stewart,et al.  World cancer report 2014. , 2014 .

[30]  Kimberly R. Kukurba,et al.  RNA Sequencing and Analysis. , 2015, Cold Spring Harbor protocols.

[31]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[32]  K.R Kavitha,et al.  PCA-based gene selection for cancer classification , 2018, 2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Huijuan Lu,et al.  A Hybrid Ensemble Algorithm Combining AdaBoost and Genetic Algorithm for Cancer Classification with Gene Expression Data , 2018, 2018 9th International Conference on Information Technology in Medicine and Education (ITME).

[36]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[37]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[39]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[40]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42]  Habibollah Haron,et al.  Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[44]  Simone A. Ludwig,et al.  Analyzing gene expression data: Fuzzy decision tree algorithm applied to the classification of cancer data , 2015, 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[45]  F. Yusof,et al.  Classification of miRNA Expression Data Using Random Forests for Cancer Diagnosis , 2016, 2016 International Conference on Computer and Communication Engineering (ICCCE).

[46]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[47]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[48]  Jan Bogaerts,et al.  Designing transformative clinical trials in the cancer genome era. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[49]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[50]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[51]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[52]  T. Hudson,et al.  The Genetic Basis for Cancer Treatment Decisions , 2012, Cell.

[53]  Qiang Chen,et al.  Network In Network , 2013, ICLR.