Automatic single- and multi-label enzymatic function prediction by machine learning

The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available at https://figshare.com/s/a63e0bafa9b71fc7cbd7.

[1]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[2]  Cristian R. Munteanu,et al.  Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. , 2008, Journal of theoretical biology.

[3]  Keun Ho Ryu,et al.  Classification of Enzyme Function from Protein Sequence based on Feature Representation , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[4]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[5]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[6]  Amir F. Atiya,et al.  Estimating the Posterior Probabilities Using the K-Nearest Neighbor Rule , 2005, Neural Computation.

[7]  Jie Li,et al.  3D representations of amino acids—applications to protein sequence comparison and classification , 2014, Computational and structural biotechnology journal.

[8]  Peter D. Karp,et al.  Prediction of Enzyme Classification from Protein Sequence without the Use of Sequence Similarity , 1997, ISMB.

[9]  Jano I. van Hemert,et al.  EnzML: multi-label prediction of enzyme classes using InterPro signatures , 2012, BMC Bioinformatics.

[10]  Hetalkumar Panchal,et al.  ENZPRED-enzymatic protein class predicting by machine learning. , 2013, Current topics in medicinal chemistry.

[11]  L. G. Pérez-Montoto,et al.  3D entropy and moments prediction of enzyme classes and experimental-theoretic study of peptide fingerprints in Leishmania parasites. , 2009, Biochimica et biophysica acta.

[12]  Humberto González Díaz,et al.  Computational chemistry study of 3D‐structure‐function relationships for enzymes based on Markov models for protein electrostatic, HINT, and van der Waals potentials , 2009, J. Comput. Chem..

[13]  C. Guda,et al.  Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism , 2015, BMC Genomics.

[14]  A. Dillmann Enzyme Nomenclature , 1965, Nature.

[15]  Nikos Paragios,et al.  A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors , 2016, IWBBIO.

[16]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[17]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[18]  Gianluca Pollastri,et al.  Accurate prediction of protein enzymatic class by N-to-1 Neural Networks , 2013, BMC Bioinformatics.

[19]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[20]  P. Dobson,et al.  Predicting enzyme class from protein structure without alignments. , 2005, Journal of molecular biology.

[21]  A. Valencia Automatic annotation of protein function. , 2005, Current opinion in structural biology.

[22]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[23]  Gianni Podda,et al.  Prediction of enzyme classes from 3D structure: a general model and examples of experimental-theoretic scoring of peptide mass fingerprints of Leishmania proteins. , 2009, Journal of proteome research.

[24]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[25]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[26]  Chetan Kumar,et al.  A top-down approach to classify enzyme functional classes and sub-classes using random forest , 2012, EURASIP J. Bioinform. Syst. Biol..

[27]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[28]  Daniel Kuhn,et al.  Predicting enzymatic function from global binding site descriptors , 2013, Proteins.

[29]  Menglong Li,et al.  Classification of multi-family enzymes by multi-label machine learning and sequence-based descriptors , 2014 .

[30]  Ishak Hashim,et al.  Hybrid Learning Algorithm in Neural Network System for Enzyme Classification , 2010, SOCO 2010.

[31]  Yong Huang,et al.  Identifying Multi-Functional Enzyme by Hierarchical Multi-Label Classifier , 2013 .

[32]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[33]  Xuan Xiao,et al.  Classifying Multifunctional Enzymes by Incorporating Three Different Models into Chou’s General Pseudo Amino Acid Composition , 2016, The Journal of Membrane Biology.