A Representation to Apply Usual Data Mining Techniques to Chemical reactions - Illustration on the Rate Constant of SN2 reactions in water

Chemical reactions always involve several molecules of two types, reactants and products. Existing data mining techniques, eg. Quantitative Structure Activity Relationship (QSAR) methods, deal with individual molecules only. In this article, we propose to use a Condensed Graph of Reaction (CGR) to merge all molecules involved in a reaction into one molecular graph. This allows one to consider reactions as pseudo-molecules and to develop QSAR models based on fragment descriptors. Then ISIDA (In SIlico Design and Analysis) fragment descriptors built from CGRs are used to generate models for the rate constant of SN2 reactions in water, using three usual attribute-value regression algorithms (linear regression, support vector machine, and regression trees). This approach is compared favorably to two state-of-the-art relational data mining techniques.

[1]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[2]  Alexandre Varnek,et al.  Modeling of Ion Complexation and Extraction Using Substructural Molecular Fragments , 2000, J. Chem. Inf. Comput. Sci..

[3]  Stephen Muggleton,et al.  Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds , 2007, J. Comput. Aided Mol. Des..

[4]  A R Katritzky,et al.  A QSRR treatment of solvent effects on the decarboxylation of 6-nitrobenzisoxazole-3-carboxylates employing molecular descriptors. , 2001, The Journal of organic chemistry.

[5]  C Helma,et al.  Fragment generation and support vector machines for inducing SARs , 2002, SAR and QSAR in environmental research.

[6]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[7]  R. Todeschini,et al.  Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing / Volume II: Appendices, References , 2009 .

[8]  Igor I. Baskin,et al.  Neural networks as a method for elucidating structure–property relationships for organic compounds , 2003 .

[9]  Shinsaku Fujita,et al.  Description of organic reactions based on imaginary transition structures. 1. Introduction of new concepts , 1986, J. Chem. Inf. Comput. Sci..

[10]  Michael Bräuer,et al.  Quantitative reactivity model for the hydration of carbon dioxide by biomimetic zinc complexes. , 2002, Inorganic chemistry.

[11]  Alexandre Varnek,et al.  Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures , 2005, J. Comput. Aided Mol. Des..