Predicting the network of substrate-enzyme-product triads by combining compound similarity and functional domain composition

BackgroundMetabolic pathway is a highly regulated network consisting of many metabolic reactions involving substrates, enzymes, and products, where substrates can be transformed into products with particular catalytic enzymes. Since experimental determination of the network of substrate-enzyme-product triad (whether the substrate can be transformed into the product with a given enzyme) is both time-consuming and expensive, it would be very useful to develop a computational approach for predicting the network of substrate-enzyme-product triads.ResultsA mathematical model for predicting the network of substrate-enzyme-product triads was developed. Meanwhile, a benchmark dataset was constructed that contains 744,192 substrate-enzyme-product triads, of which 14,592 are networking triads, and 729,600 are non-networking triads; i.e., the number of the negative triads was about 50 times the number of the positive triads. The molecular graph was introduced to calculate the similarity between the substrate compounds and between the product compounds, while the functional domain composition was introduced to calculate the similarity between enzyme molecules. The nearest neighbour algorithm was utilized as a prediction engine, in which a novel metric was introduced to measure the "nearness" between triads. To train and test the prediction engine, one tenth of the positive triads and one tenth of the negative triads were randomly picked from the benchmark dataset as the testing samples, while the remaining were used to train the prediction model. It was observed that the overall success rate in predicting the network for the testing samples was 98.71%, with 95.41% success rate for the 1,460 testing networking triads and 98.77% for the 72,960 testing non-networking triads.ConclusionsIt is quite promising and encouraged to use the molecular graph to calculate the similarity between compounds and use the functional domain composition to calculate the similarity between enzymes for studying the substrate-enzyme-product network system. The software is available upon request.

[1]  J. Chou,et al.  Kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-88204E. , 1993, Biochemistry.

[2]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[3]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[4]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[5]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[6]  Kuo-Chen Chou,et al.  Predicting protein structural class by functional domain composition. , 2004, Biochemical and biophysical research communications.

[7]  Peilin Jia,et al.  Prediction of subcellular protein localization based on functional domain composition. , 2007, Biochemical and biophysical research communications.

[8]  Ziliang Qian,et al.  Prediction of peptidase category based on functional domain composition. , 2008, Journal of proteome research.

[9]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[10]  C. Kuo-chen,et al.  FoldRate: A Web-Server for Predicting Protein Folding Rates from Primary Sequence , 2009 .

[11]  K. Chou,et al.  A new schematic method in enzyme kinetics. , 2005, European journal of biochemistry.

[12]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[13]  C. Kuo-chen,et al.  Graphical rules for non-steady state enzyme kinetics. , 1981, Journal of theoretical biology.

[14]  Kuo-Chen Chou,et al.  Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. , 2003, Biochemical and biophysical research communications.

[15]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[16]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[17]  J. Andraos Kinetic plasticity and the determination of product ratios for kinetic schemes leading to multiple products without rate laws — New methods based on directed graphs , 2008 .

[18]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[19]  Malcolm J. McGregor,et al.  Clustering of Large Databases of Compounds: Using the MDL "Keys" as Structural Descriptors , 1997, J. Chem. Inf. Comput. Sci..

[20]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[21]  K. Chou,et al.  ProtIdent: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information. , 2008, Biochemical and biophysical research communications.

[22]  L. Resnick,et al.  The quinoline U-78036 is a potent inhibitor of HIV-1 reverse transcriptase. , 1993, The Journal of biological chemistry.

[23]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[24]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[25]  Susumu Goto,et al.  LIGAND: chemical database of enzyme reactions , 2000, Nucleic Acids Res..

[26]  Sándor Pongor,et al.  The SBASE protein domain library, Release 4.0: a collection of annotated protein sequence segments , 1993, Nucleic Acids Res..

[27]  Kuo-Chen Chou,et al.  Using functional domain composition to predict enzyme family classes. , 2005, Journal of proteome research.

[28]  M. Kanehisa,et al.  Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. , 2003, Journal of the American Chemical Society.

[29]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[30]  Kuo-Chen Chou,et al.  Predicting enzyme subclass by functional domain composition and pseudo amino acid composition. , 2005, Journal of proteome research.

[31]  K. Chou Applications of graph theory to enzyme kinetics and protein folding kinetics. Steady and non-steady-state systems. , 2020, Biophysical chemistry.

[32]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[33]  Masaaki Muraki,et al.  An encoding system for a group contribution method , 1992, J. Chem. Inf. Comput. Sci..

[34]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[35]  S. Forsén,et al.  Graphical rules for enzyme-catalysed rate laws. , 1980, The Biochemical journal.

[36]  G. Zhou,et al.  An extension of Chou's graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways. , 1984, The Biochemical journal.

[37]  Kuo-Chen Chou,et al.  Predicting networking couples for metabolic pathways of Arabidopsis , 2006 .

[38]  Chou Kuo-Chen,et al.  Graphical rules of steady-state reaction systems , 1981 .

[39]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[40]  K. Chou Graphic rule for drug metabolism systems. , 2010, Current drug metabolism.

[41]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[42]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[43]  Patricia C. Babbitt,et al.  Quantitative Comparison of Catalytic Mechanisms and Overall Reactions in Convergently Evolved Enzymes: Implications for Classification of Enzyme Function , 2010, PLoS Comput. Biol..

[44]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[45]  K C Chou,et al.  Kinetics of processive nucleic acid polymerases and nucleases. , 1994, Analytical biochemistry.

[46]  Kuo-Chen Chou,et al.  Designing Inhibitors of M2 Proton Channel against H1N1 Swine Influenza Virus , 2010, PloS one.

[47]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[48]  K. Chou Graphic rules in steady and non-steady state enzyme kinetics. , 1989, The Journal of biological chemistry.

[49]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[50]  Kuo-Chen Chou,et al.  QuatIdent: a web server for identifying protein quaternary structural attribute by fusing functional domain and sequential evolution information. , 2009, Journal of proteome research.

[51]  David Myers,et al.  Microcomputer tools for steady-state enzyme kinetics , 1985, Comput. Appl. Biosci..

[52]  K. Chou,et al.  Predicting protein fold pattern with functional domain and sequential evolution information. , 2009, Journal of theoretical biology.

[53]  J. Chou,et al.  Steady-state kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-87201E. , 1993, The Journal of biological chemistry.

[54]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[55]  Sándor Pongor,et al.  The SBASE protein domain library, release 8.0: a collection of annotated protein sequence segments , 2001, Nucleic Acids Res..

[56]  Kuo-Chen Chou,et al.  Binding of CYP2C9 with diverse drugs and its implications for metabolic mechanism. , 2009, Medicinal chemistry (Shariqah (United Arab Emirates)).

[57]  K. Chou,et al.  Predicting the quaternary structure attribute of a protein by hybridizing functional domain composition and pseudo amino acid composition , 2009 .

[58]  Chuan Wang,et al.  Classification of protein quaternary structure by functional domain composition , 2006, BMC Bioinformatics.

[59]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[60]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[61]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[62]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[63]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..