Alignment-Free Method to Predict Enzyme Classes and Subclasses

The Enzyme Classification (EC) number is a numerical classification scheme for enzymes, established using the chemical reactions they catalyze. This classification is based on the recommendation of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. Six enzyme classes were recognised in the first Enzyme Classification and Nomenclature List, reported by the International Union of Biochemistry in 1961. However, a new enzyme group was recently added as the six existing EC classes could not describe enzymes involved in the movement of ions or molecules across membranes. Such enzymes are now classified in the new EC class of translocases (EC 7). Several computational methods have been developed in order to predict the EC number. However, due to this new change, all such methods are now outdated and need updating. In this work, we developed a new multi-task quantitative structure–activity relationship (QSAR) method aimed at predicting all 7 EC classes and subclasses. In so doing, we developed an alignment-free model based on artificial neural networks that proved to be very successful.

[1]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[2]  Cristian Robert Munteanu,et al.  Random Forest classification based on star graph topological indices for antioxidant proteins. , 2013, Journal of theoretical biology.

[3]  Chetan Kumar,et al.  A top-down approach to classify enzyme functional classes and sub-classes using random forest , 2012, EURASIP J. Bioinform. Syst. Biol..

[4]  Cristian R. Munteanu,et al.  New Markov-Shannon Entropy models to assess connectivity quality in complex networks: from molecular to cellular pathway, Parasite-Host, Neural, Industry, and Legal-Social networks. , 2012, Journal of theoretical biology.

[5]  Nicholas J. Davidson,et al.  Non-Alignment Features Based Enzyme/Non-Enzyme Classification Using an Ensemble Method , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[6]  Maria Jesus Martin,et al.  ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature , 2018, BMC Bioinformatics.

[7]  Commision on Biochemical Nomenclature Enzyme Nomenclature: Recommendations (1972) of the International Union of Pure and Applied Chemistry and the International Union of Biochemistry. Supplement 1: Corrections & Additions (1975). , 1976, Biochimica et biophysica acta.

[8]  Daniel J. Graham,et al.  Information Content in Organic Molecules: Quantification and Statistical Structure via Brownian Processing , 2004, J. Chem. Inf. Model..

[9]  James E. Johnson,et al.  NCBI BLAST+ integrated into Galaxy , 2015, bioRxiv.

[10]  Lourdes Santana,et al.  A model for the recognition of protein kinases based on the entropy of 3D van der Waals interactions. , 2007, Journal of proteome research.

[11]  Jure Zupan,et al.  On representation of proteins by star-like graphs. , 2007, Journal of molecular graphics & modelling.

[12]  Cenk Sahin,et al.  A Radial Basis Function Neural Network (RBFNN) Approach for Structural Classification of Thyroid Diseases , 2008, Journal of Medical Systems.

[13]  Kenji Mizuguchi,et al.  Prediction of Detailed Enzyme Functions and Identification of Specificity Determining Residues by Random Forests , 2014, PloS one.

[14]  Dmitrij Frishman,et al.  The PEDANT genome database , 2003, Nucleic Acids Res..

[15]  Sabri Boughorbel,et al.  Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric , 2017, PloS one.

[16]  A. Bairoch The ENZYME data bank. , 1993, Nucleic acids research.

[17]  Yukako Tohsato,et al.  ECOH: An Enzyme Commission number predictor using mutual information and a support vector machine , 2013, Bioinform..

[18]  Ren Long,et al.  Identification of Multi-Functional Enzyme with Multi-Label Classifier , 2016, PloS one.

[19]  Juliana S Bernardes,et al.  A review of protein function prediction under machine learning perspective. , 2013, Recent patents on biotechnology.

[20]  Nai-Yang Deng,et al.  Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature. , 2010, Protein and peptide letters.

[21]  Carlos Fernandez-Lozano,et al.  Classification of signaling proteins based on molecular star graph descriptors using Machine Learning models , 2015, Journal of theoretical biology.

[22]  Gianni Podda,et al.  Prediction of enzyme classes from 3D structure: a general model and examples of experimental-theoretic scoring of peptide mass fingerprints of Leishmania proteins. , 2009, Journal of proteome research.

[23]  Nikos Paragios,et al.  EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation , 2017, PeerJ.

[24]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[25]  Lihua Li,et al.  DEEPre: sequence-based enzyme EC number prediction by deep learning , 2017, Bioinform..

[26]  Jose A. Serantes,et al.  Star Graphs of Protein Sequences and Proteome Mass Spectra in Cancer Prediction , 2009 .

[27]  C. A. Andersen,et al.  Prediction of human protein function from post-translational modifications and localization features. , 2002, Journal of molecular biology.

[28]  M. Rafiee-Tehrani,et al.  Application of Arti fi cial Neural Networks for Optimization of Preparation of Insulin Nanoparticles Composed of Quaternized Aromatic Derivatives of Chitosan , 2022 .

[29]  Humbert González-Díaz,et al.  PTML Model of Enzyme Subclasses for Mining the Proteome of Bio-fuel Producing Microorganisms. , 2019, Journal of proteome research.

[30]  Claudia Beleites,et al.  Assessing and improving the stability of chemometric models in small sample size situations , 2008, Analytical and bioanalytical chemistry.

[31]  Cristian R. Munteanu,et al.  S2SNet: A Tool for Transforming Characters and Numeric Sequences into Star Network Topological Indices in Chemoinformatics, Bioinformatics, Biomedical, and Social-Legal Sciences , 2013 .

[32]  Cristian R. Munteanu,et al.  Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. , 2008, Journal of theoretical biology.

[33]  Humberto González-Díaz,et al.  Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L. , 2006, FEBS letters.

[34]  María Martín,et al.  Ongoing and future developments at the Universal Protein Resource , 2010, Nucleic Acids Res..

[35]  Nam-Kyung Lee,et al.  Machine learning study for the prediction of transdermal peptide , 2011, J. Comput. Aided Mol. Des..

[36]  Qian-Nan Hu,et al.  Assignment of EC Numbers to Enzymatic Reactions with Reaction Difference Fingerprints , 2012, PloS one.

[37]  Anastasios Bezerianos,et al.  Radial basis function neural networks for the characterization of heart rate variability dynamics , 1999, Artif. Intell. Medicine.

[38]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[39]  Heng Huang,et al.  From Protein Sequence to Protein Function via Multi-Label Linear Discriminant Analysis , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[41]  Humberto González-Díaz,et al.  ANN multiplexing model of drugs effect on macrophages; theoretical and flow cytometry study on the cytotoxicity of the anti-microbial drug G1 in spleen. , 2012, Bioorganic & medicinal chemistry.

[42]  Sabine Van Huffel,et al.  Machine Learning Approach for Classifying Multiple Sclerosis Courses by Combining Clinical Data with Lesion Loads and Magnetic Resonance Metabolic Features , 2017, Front. Neurosci..

[43]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[44]  J. Skolnick,et al.  How well is enzyme function conserved as a function of pairwise sequence identity? , 2003, Journal of molecular biology.

[45]  John Ignatius Griffin,et al.  Statistics; methods and applications , 1963 .

[46]  M. Quiles,et al.  Artificial Neural Networks and the Study of the Psychoactivity of Cannabinoid Compounds , 2010, Chemical biology & drug design.

[47]  Humberto González-Díaz,et al.  Markov entropy backbone electrostatic descriptors for predicting proteins biological activity. , 2004, Bioorganic & medicinal chemistry letters.

[48]  Daniel J. Graham,et al.  Base Information Content in Organic Formulas , 2000, J. Chem. Inf. Comput. Sci..

[49]  Dietmar Schomburg,et al.  EnzymeDetector: an integrated enzyme function prediction tool and database , 2011, BMC Bioinformatics.

[50]  Daniel J. Graham,et al.  Information Content in Organic Molecules: Aggregation States and Solvent Effects , 2005, J. Chem. Inf. Model..

[51]  Douglas M. Hawkins,et al.  Quantitative Structure-Activity Relationship Modeling of Juvenile Hormone Mimetic Compounds for Culex Pipiens Larvae, with a Discussion of Descriptor-Thinning Methods , 2006, J. Chem. Inf. Model..

[52]  Mohamad Ivan Fanany,et al.  Classifying abnormal activities in exam using multi-class Markov chain LDA based on MODEC features , 2016, 2016 4th International Conference on Information and Communication Technology (ICoICT).

[53]  E. Uriarte,et al.  3D entropy and moments prediction of enzyme classes and experimental-theoretic study of peptide fingerprints in Leishmania parasites. , 2009, Biochimica et biophysica acta.

[54]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[55]  Sandra E. Safo,et al.  General sparse multi-class linear discriminant analysis , 2016, Comput. Stat. Data Anal..

[56]  Daniel J. Graham,et al.  Information Content in Organic Molecules: Brownian Processing at Low Levels , 2007, J. Chem. Inf. Model..

[57]  Danail Bonchev,et al.  Trends in information theory-based chemical structure codification , 2014, Molecular Diversity.

[58]  David S. Goodsell,et al.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..