Prediction of compounds’ biological function (metabolic pathways) based on functional group composition

Efficient in silico screening approaches may provide valuable hints on biological functions of the compound-candidates, which could help to screen functional compounds either in basic researches on metabolic pathways or drug discovery. Here, we introduce a machine learning method (Nearest Neighbor Algorithm) based on functional group composition of compounds to the analysis of metabolic pathways. This method can quickly map small chemical molecules to the metabolic pathway that they likely belong to. A set of 2,764 compounds from 11 major classes of metabolic pathways were selected for study. The overall prediction rate reached 73.3%, indicating that functional group composition of compounds was really related to their biological metabolic functions.

[1]  T. Mckee,et al.  Biochemistry: An Introduction , 1998 .

[2]  Yu-Dong Cai,et al.  Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition , 2004, Bioinform..

[3]  R. King,et al.  New approach to pharmacophore mapping and QSAR analysis using inductive logic programming. Application to thermolysin inhibitors and glycogen phosphorylase B inhibitors. , 2002, Journal of medicinal chemistry.

[4]  Yixue Li,et al.  An approach to predict transcription factor DNA binding site specificity based upon gene and transcription factor functional categorization , 2007, Bioinform..

[5]  M. Burkart Metabolic engineering--a genetic toolbox for small molecule organic synthesis. , 2003, Organic & biomolecular chemistry.

[6]  J. Lindon,et al.  'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. , 1999, Xenobiotica; the fate of foreign compounds in biological systems.

[7]  J. Lindon,et al.  Metabonomics: a platform for studying drug toxicity and gene function , 2002, Nature Reviews Drug Discovery.

[8]  Elaine Holmes,et al.  The challenges of modeling mammalian biocomplexity , 2004, Nature Biotechnology.

[9]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..

[10]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[11]  I. Wilson,et al.  Understanding 'Global' Systems Biology: Metabonomics and the Continuum of Metabolism , 2003, Nature Reviews Drug Discovery.

[12]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[13]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..

[14]  Yixue Li,et al.  ECS: An automatic enzyme classifier based on functional domain composition , 2007, Comput. Biol. Chem..

[15]  Kuo-Chen Chou,et al.  Predicting enzyme subclass by functional domain composition and pseudo amino acid composition. , 2005, Journal of proteome research.

[16]  Sukanta Mondal,et al.  Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. , 2006, Journal of theoretical biology.

[17]  Peilin Jia,et al.  Prediction of subcellular protein localization based on functional domain composition. , 2007, Biochemical and biophysical research communications.

[18]  S Salzberg,et al.  Predicting protein secondary structure with a nearest-neighbor algorithm. , 1992, Journal of molecular biology.