A Representation to Apply Usual Data Mining Techniques to Chemical reactions - Illustration on the Rate Constant of SN2 reactions in water

Chemical reactions always involve several molecules of two types, reactants and products. Existing datamining techniques, eg. Quantitative Structure Activity Relationship (QSAR)methods, deal with individual molecules only. In this article, we propose to use Condensed Graph of Reaction (CGR) approach merging all molecules involved in a reaction into one molecular graph. This allows one to consider reactions as pseudomolecules and to develop QSAR models based on fragment descriptors. Here ISIDA fragment descriptors calculated from CGRs have been used to build quantitative models for the rate constant of SN2 reactions in water. Three common attribute-value regression algorithms (linear regression, support vector machine, and regression trees) have been evaluated.

[1]  A R Katritzky,et al.  A QSRR treatment of solvent effects on the decarboxylation of 6-nitrobenzisoxazole-3-carboxylates employing molecular descriptors. , 2001, The Journal of organic chemistry.

[2]  Shinsaku Fujita,et al.  Description of organic reactions based on imaginary transition structures. 1. Introduction of new concepts , 1986, J. Chem. Inf. Comput. Sci..

[3]  Alexandre Varnek,et al.  Modeling of Ion Complexation and Extraction Using Substructural Molecular Fragments , 2000, J. Chem. Inf. Comput. Sci..

[4]  Stephen Muggleton,et al.  Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds , 2007, J. Comput. Aided Mol. Des..

[5]  Ian Witten,et al.  Data Mining , 2000 .

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[7]  Roberto Todeschini,et al.  Molecular descriptors for chemoinformatics , 2009 .

[8]  Igor I. Baskin,et al.  Neural networks as a method for elucidating structure–property relationships for organic compounds , 2003 .

[9]  Saso Dzroski,et al.  Relational data mining applications: an overview , 2001 .

[10]  C Helma,et al.  Fragment generation and support vector machines for inducing SARs , 2002, SAR and QSAR in environmental research.

[11]  Jinbo Bi,et al.  Regression Error Characteristic Curves , 2003, ICML.

[12]  Alexandre Varnek,et al.  Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures , 2005, J. Comput. Aided Mol. Des..

[13]  Michael Bräuer,et al.  Quantitative reactivity model for the hydration of carbon dioxide by biomimetic zinc complexes. , 2002, Inorganic chemistry.

[14]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..