DeEPn: a deep neural network based tool for enzyme functional annotation

With the advancement of high throughput techniques, the discovery rate of enzyme sequences has increased significantly in the recent past. All of these raw sequences are required to be precisely mapped to their respective functional attributes, which helps in deciphering their biological role. In the recent past, various prediction models have been proposed to predict the enzyme functional class; however, all of these models were able to quantify at most six functional enzyme classes (EC1 to EC6) out of existing seven functional classes, making these approaches inappropriate for handling enzymes corresponding to the seventh functional class (EC7). In this study, a Deep Neural Network-based approach, DeEPn, has been proposed, which can quantify enzymes corresponding to all seven functional classes with high precision and accuracy. The proposed model was compared with two recently developed tools, ECPred and SVM-Prot. The result demonstrated that DeEPn outperformed ECPred and SVM-Prot in terms of predictive quality. The DeEPn tool has been hosted as a web-based tool at https://bioserver.iiita.ac.in/DeEPn/.

[1]  Yang Zhang,et al.  COFACTOR: an accurate comparative algorithm for structure-based protein function annotation , 2012, Nucleic Acids Res..

[2]  Shuang Li,et al.  SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity , 2016, PloS one.

[3]  J. Skolnick,et al.  EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference. , 2004, Nucleic acids research.

[4]  Jean-Louis Reymond,et al.  Enzyme assays for high-throughput screening. , 2004, Current opinion in biotechnology.

[5]  Kuo-Chen Chou,et al.  Prediction of enzyme family classes. , 2003, Journal of proteome research.

[6]  C. Boyd,et al.  Novel placental expression of 2,3-bisphosphoglycerate mutase. , 2006, Placenta.

[7]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[8]  Gianluca Pollastri,et al.  Accurate prediction of protein enzymatic class by N-to-1 Neural Networks , 2013, BMC Bioinformatics.

[9]  Marc A. Martí-Renom,et al.  Prediction of enzyme function by combining sequence similarity and protein interactions , 2008, BMC Bioinformatics.

[10]  A. Tomkinson,et al.  Human DNA ligase I completely encircles and partially unwinds nicked DNA , 2004, Nature.

[11]  A. Kern,et al.  Structure of mammalian ornithine decarboxylase at 1.6 A resolution: stereochemical implications of PLP-dependent amino acid decarboxylases. , 1999, Structure.

[12]  J. Skolnick,et al.  How well is enzyme function conserved as a function of pairwise sequence identity? , 2003, Journal of molecular biology.

[13]  P. Dobson,et al.  Predicting enzyme class from protein structure without alignments. , 2005, Journal of molecular biology.

[14]  K. Hult,et al.  Improved Enantioselectivity of a Lipase by Rational Protein Engineering , 2001, Chembiochem : a European journal of chemical biology.

[15]  Nai-Yang Deng,et al.  Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature. , 2010, Protein and peptide letters.

[16]  Jaques Reifman,et al.  Genome‐wide enzyme annotation with precision control: Catalytic families (CatFam) databases , 2009, Proteins.

[17]  Maria Jesus Martin,et al.  ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature , 2018, BMC Bioinformatics.

[18]  Sabri Boughorbel,et al.  Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric , 2017, PloS one.

[19]  Jano I. van Hemert,et al.  EnzML: multi-label prediction of enzyme classes using InterPro signatures , 2012, BMC Bioinformatics.

[20]  Yong Wang,et al.  Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context , 2011, BMC Systems Biology.

[21]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[22]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[23]  Dong-Sheng Cao,et al.  protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences , 2015, Bioinform..

[24]  Dietmar Schomburg,et al.  EnzymeDetector: an integrated enzyme function prediction tool and database , 2011, BMC Bioinformatics.

[25]  Gora Chand Nandi,et al.  Robust and accurate feature selection for humanoid push recovery and classification: deep learning approach , 2017, Neural Computing and Applications.

[26]  Islam Ibrahim Amin,et al.  Enzyme Function Classification Based on Sequence Alignment , 2015 .

[27]  HuangYing,et al.  CD-HIT Suite , 2010 .

[28]  A. Cornish-Bowden Current IUBMB recommendations on enzyme nomenclature and kinetics , 2014 .

[29]  Kuo-Chen Chou,et al.  Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[30]  D. Agarwal,et al.  Population genetic studies on aldehyde dehydrogenase isozyme deficiency and alcohol sensitivity. , 1983, American journal of human genetics.

[31]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[32]  Manoj Kumar,et al.  Development of an Efficient Protein Extraction Method Compatible with LC-MS/MS for Proteome Mapping in Two Australian Seagrasses Zostera muelleri and Posidonia australis , 2017, Front. Plant Sci..

[33]  Xuan Xiao,et al.  Classifying Multifunctional Enzymes by Incorporating Three Different Models into Chou’s General Pseudo Amino Acid Composition , 2016, The Journal of Membrane Biology.

[34]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[35]  T. Hunter,et al.  The Protein Kinase Complement of the Human Genome , 2002, Science.

[36]  Shiow-Fen Hwang,et al.  Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method , 2007, Biosyst..

[37]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[38]  Chetan Kumar,et al.  A top-down approach to classify enzyme functional classes and sub-classes using random forest , 2012, EURASIP J. Bioinform. Syst. Biol..

[39]  Y.Z. Chen,et al.  Enzyme family classification by support vector machines , 2004, Proteins.

[40]  A. Godzik Metagenomics and the protein universe. , 2011, Current opinion in structural biology.

[41]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[42]  Ying Huang,et al.  EFICAz2: enzyme function inference by a combined approach enhanced by machine learning , 2009, BMC Bioinformatics.

[43]  Efendi N. Nasibov,et al.  Efficiency analysis of KNN and minimum distance-based classifiers in enzyme family prediction , 2009, Comput. Biol. Chem..

[44]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[45]  Kenji Mizuguchi,et al.  Prediction of Detailed Enzyme Functions and Identification of Specificity Determining Residues by Random Forests , 2014, PloS one.

[46]  Jeffrey Skolnick,et al.  EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes , 2012, Bioinform..

[47]  Lihua Li,et al.  DEEPre: sequence-based enzyme EC number prediction by deep learning , 2017, Bioinform..

[48]  Jianding Qiu,et al.  Using support vector machines to distinguish enzymes: approached by incorporating wavelet transform. , 2009, Journal of theoretical biology.

[49]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[50]  Rahul Semwal,et al.  Pharmadoop: a tool for pharmacophore searching using Hadoop framework , 2017, Network Modeling Analysis in Health Informatics and Bioinformatics.

[51]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[52]  Yixue Li,et al.  ECS: An automatic enzyme classifier based on functional domain composition , 2007, Comput. Biol. Chem..

[53]  Yang Zhang,et al.  COFACTOR: improved protein function prediction by combining structure, sequence and protein–protein interaction information , 2017, Nucleic Acids Res..