Predicting biodegradation products and pathways: a hybrid knowledge- and machine learning-based approach

MOTIVATION Current methods for the prediction of biodegradation products and pathways of organic environmental pollutants either do not take into account domain knowledge or do not provide probability estimates. In this article, we propose a hybrid knowledge- and machine learning-based approach to overcome these limitations in the context of the University of Minnesota Pathway Prediction System (UM-PPS). The proposed solution performs relative reasoning in a machine learning framework, and obtains one probability estimate for each biotransformation rule of the system. As the application of a rule then depends on a threshold for the probability estimate, the trade-off between recall (sensitivity) and precision (selectivity) can be addressed and leveraged in practice. RESULTS Results from leave-one-out cross-validation show that a recall and precision of approximately 0.8 can be achieved for a subset of 13 transformation rules. Therefore, it is possible to optimize precision without compromising recall. We are currently integrating the results into an experimental version of the UM-PPS server. AVAILABILITY The program is freely available on the web at http://wwwkramer.in.tum.de/research/applications/biodegradation/data. CONTACT kramer@in.tum.de.

[1]  S Dimitrov,et al.  A kinetic model for predicting biodegradation , 2007, SAR and QSAR in environmental research.

[2]  Ezio Bartocci,et al.  Learning and detecting emergent behavior in networks of cardiac myocytes , 2008, CACM.

[3]  Thomas Hofmann,et al.  Predicting structured objects with support vector machines , 2009, Commun. ACM.

[4]  Stefan Kramer,et al.  Frequent free tree discovery in graph data , 2004, SAC '04.

[6]  John D. Walker,et al.  Predicting the biodegradation products of perfluorinated chemicals using CATABOL , 2004, SAR and QSAR in environmental research.

[7]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[9]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[10]  Alistair B A Boxall,et al.  Assessing the ecotoxicity of pesticide transformation products. , 2003, Environmental science & technology.

[11]  Lynda B. M. Ellis,et al.  Encoding microbial metabolic logic: predicting biodegradation , 2004, Journal of Industrial Microbiology & Biotechnology.

[12]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[13]  Stefan Kramer,et al.  Data-driven extraction of relative reasoning rules to limit combinatorial explosion in biodegradation pathway prediction , 2008, Bioinform..

[14]  Alfonso Valencia,et al.  New books , 2006, Philosophy.

[15]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[16]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[17]  P N Judson,et al.  Knowledge-based expert systems for toxicity and metabolism prediction: DEREK, StAR and METEOR. , 1999, SAR and QSAR in environmental research.

[18]  Fangping Mu,et al.  Prediction of oxidoreductase-catalyzed reactions based on atomic properties of metabolites , 2006, Bioinform..

[19]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[22]  Philip N. Judson,et al.  Using Absolute and Relative Reasoning in the Prediction of the Potential Metabolism of Xenobiotics. , 2003 .

[23]  Gilles Klopman,et al.  META, 3. A Genetic Algorithm for Metabolic Transform Priorities Optimization , 1997, J. Chem. Inf. Comput. Sci..

[24]  Lynda B. M. Ellis,et al.  The University of Minnesota Biocatalysis/Biodegradation Database: the first decade , 2005, Nucleic Acids Res..