Towards recognition of protein function based on its structure using deep convolutional networks

This paper proposes a novel method for protein function recognition using deep learning. Recently, deep convolutional neural networks (DCNNs) demonstrated high performances in many areas of pattern recognition. Protein function is often associated with its tertiary structure denoting the active domain of a protein. This investigation develops a novel DCNN for protein functionality recognition based on its tertiary structure. Two rounds of experiments are performed. The initial experiment on tertiary protein structure alignment shows promising performances (94% accuracy rate) such that it shows the model robustness against rotations, local translations, and scales of the 3D structure. With these results, the main experiments contain five different datasets obtained by similarity measures between pairs of gene ontology terms. The experimental results for protein function recognition on selected datasets show 87.6% and 80.7% maximum and average accuracy rates respectively. The initial success of the DCNN in tertiary protein structure recognition supports further investigations with respect to tertiary protein retrieval and pattern mining on large scale problems.

[1]  Taeho Jo,et al.  Improving Protein Fold Recognition by Deep Learning Networks , 2015, Scientific Reports.

[2]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[3]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[4]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[5]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[6]  C. Branden,et al.  Introduction to protein structure , 1991 .

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[9]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[10]  Greg Turk,et al.  Simplification and Repair of Polygonal Models Using Volumetric Techniques , 2003, IEEE Trans. Vis. Comput. Graph..

[11]  Lena Jaeger,et al.  Introduction To Protein Structure , 2016 .

[12]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[13]  Q. Zou,et al.  Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier , 2013, PloS one.

[14]  G. Sun,et al.  The footprint of urban heat island effect in China , 2015, Scientific Reports.

[15]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[16]  M. Brylinski,et al.  eThread: A Highly Optimized Machine Learning-Based Approach to Meta-Threading and the Modeling of Protein Tertiary Structures , 2012, PloS one.

[17]  Mohammad Sohel Rahman,et al.  CoMOGrad and PHOG: From Computer Vision to Fast and Accurate Protein Tertiary Structure Retrieval , 2014, Scientific Reports.

[18]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[19]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[22]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[23]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[24]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .

[25]  Yves Moreau,et al.  Protein fold recognition using geometric kernel data fusion , 2014, Bioinform..

[26]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.