A Deep Learning Approach for Cancer Detection and Relevant Gene Identification

Cancer detection from gene expression data continues to pose a challenge due to the high dimensionality and complexity of these data. After decades of research there is still uncertainty in the clinical diagnosis of cancer and the identification of tumor-specific markers. Here we present a deep learning approach to cancer detection, and to the identification of genes critical for the diagnosis of breast cancer. First, we used Stacked Denoising Autoencoder (SDAE) to deeply extract functional features from high dimensional gene expression profiles. Next, we evaluated the performance of the extracted representation through supervised classification models to verify the usefulness of the new features in cancer detection. Lastly, we identified a set of highly interactive genes by analyzing the SDAE connectivity matrices. Our results and analysis illustrate that these highly interactive genes could be useful cancer biomarkers for the detection of breast cancer that deserve further studies.

[1]  S. Reed,et al.  Identification of differentially expressed genes in human prostate cancer using subtraction and microarray. , 2000, Cancer research.

[2]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[3]  Aman Gupta,et al.  Learning structure in gene expression data using deep architectures, with an application to gene clustering , 2015 .

[4]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[5]  Jun Zhang,et al.  Characterization of Differentially Expressed Genes Involved in Pathways Associated with Gastric Cancer , 2015, PloS one.

[6]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[7]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Yipeng Du,et al.  Transcriptomic analysis of human breast cancer cells reveals differentially expressed genes and related cellular functions and pathways in response to gold nanorods , 2015, Biophysics reports.

[9]  C. J. Robbins,et al.  Differentially Expressed Genes and Signature Pathways of Human Prostate Cancer , 2015, PloS one.

[10]  Jie Zhou,et al.  Discovering transcription factor regulatory targets using gene expression and binding data , 2012, Bioinform..

[11]  Rasool Fakoor,et al.  Using deep learning to enhance cancer diagnosis and classication , 2013 .

[12]  Eeva Kettunen,et al.  Differentially expressed genes in nonsmall cell lung cancer: expression profiling of cancer-related genes in squamous cell lung cancer. , 2004, Cancer genetics and cytogenetics.

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  K. Kinzler,et al.  Identification of p53 as a sequence-specific DNA-binding protein , 1991, Science.

[15]  Casey S. Greene,et al.  Unsupervised Feature Construction and Knowledge Extraction from Genome-Wide Assays of Breast Cancer with Denoising Autoencoders , 2014, Pacific Symposium on Biocomputing.

[16]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[17]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of clear cell renal cell carcinoma , 2013, Nature.

[18]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[19]  D. Pim,et al.  Isolation and characterization of a human p53 cDNA clone: expression of the human p53 gene. , 1984, The EMBO journal.

[20]  Huiqing Liu,et al.  Discovery of significant rules for classifying cancer diagnosis data , 2003, ECCB.

[21]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[22]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[23]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[24]  M. Isobe,et al.  Localization of gene for human p53 tumour antigen to band 17p13 , 1986, Nature.

[25]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[26]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[27]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.