A Deep Learning Model for Predicting Tumor Suppressor Genes and Oncogenes from PDB Structure

While cancer is a heterogeneous complex of distinct diseases, the common underlying mechanism for uncontrolled tumor growth is due to mutations in proto-oncogenes and the loss of the regulatory function of tumor suppression genes. In this paper we propose a novel deep learning model for predicting tumor suppression genes (TSGs) and proto-oncogenes (OGs) from their Protein Data Bank (PDB) three dimensional structures. Specifically, we develop a convolutional neural network (CNN) to classify the feature map sets extracted from the tertiary protein structures. Each feature map set represents particular biological features associated with the atomic coordinates appearing on the outer surface of protein's three dimensional structure. The experimental results on the collected dataset for classifying TSGs and OGs demonstrate promising performance with 82.57% accuracy and 0.89 area under ROC curve. The initial success of the proposed model warrants further study to develop a comprehensive model to identify the cancer driver genes or events using the principle cancer genes (TSG and OG).

[1]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[2]  Taeho Jo,et al.  Improving Protein Fold Recognition by Deep Learning Networks , 2015, Scientific Reports.

[3]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[4]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[5]  David Tamborero,et al.  OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes , 2013, Bioinform..

[6]  A. Gonzalez-Perez,et al.  Functional impact bias reveals cancer drivers , 2012, Nucleic acids research.

[7]  Rasiah Loganantharaj,et al.  Towards recognition of protein function based on its structure using deep convolutional networks , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[8]  Suzanna Lewis,et al.  Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium , 2011, Briefings Bioinform..

[9]  Dusanka Janezic,et al.  Structure-Based Function Prediction of Uncharacterized Protein Using Binding Sites Comparison , 2013, PLoS Comput. Biol..

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Adam P. Rosebrock,et al.  A global genetic interaction network maps a wiring diagram of cellular function , 2016, Science.

[12]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[13]  Manfred Huber,et al.  Using deep learning to enhance cancer diagnosis and classication , 2013 .

[14]  Obi L. Griffith,et al.  Statistically identifying tumor suppressors and oncogenes from pan-cancer genome-sequencing data , 2015, Bioinform..

[15]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[16]  Dayong Wang,et al.  Deep Learning for Identifying Metastatic Breast Cancer , 2016, ArXiv.

[17]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[18]  Chittibabu Guda,et al.  Classification of breast cancer patients using somatic mutation profiles and machine learning approaches , 2016, BMC Systems Biology.

[19]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  C. Orengo,et al.  Protein function annotation by homology-based inference , 2009, Genome Biology.

[22]  Asa Ben-Hur,et al.  Hierarchical Classification of Gene Ontology Terms Using the Gostruct Method , 2010, J. Bioinform. Comput. Biol..

[23]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Brad T. Sherman,et al.  The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists , 2007, Genome Biology.

[26]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[27]  Dong Xu,et al.  Classification of lung cancer using ensemble-based feature selection and machine learning methods. , 2015, Molecular bioSystems.

[28]  Philip E. Bourne,et al.  The Protein Data Bank, 1999– , 2006 .

[29]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[30]  Zenghui Wang,et al.  Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review , 2017, Neural Computation.

[31]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[32]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[33]  D. Tripathy,et al.  Oncogenes and tumor suppressor genes in breast cancer: potential diagnostic and therapeutic applications. , 2004, The oncologist.