Structure–reactivity modeling using mixture-based representation of chemical reactions

We describe a novel approach of reaction representation as a combination of two mixtures: a mixture of reactants and a mixture of products. In turn, each mixture can be encoded using an earlier reported approach involving simplex descriptors (SiRMS). The feature vector representing these two mixtures results from either concatenated product and reactant descriptors or the difference between descriptors of products and reactants. This reaction representation doesn’t need an explicit labeling of a reaction center. The rigorous “product-out” cross-validation (CV) strategy has been suggested. Unlike the naïve “reaction-out” CV approach based on a random selection of items, the proposed one provides with more realistic estimation of prediction accuracy for reactions resulting in novel products. The new methodology has been applied to model rate constants of E2 reactions. It has been demonstrated that the use of the fragment control domain applicability approach significantly increases prediction accuracy of the models. The models obtained with new “mixture” approach performed better than those required either explicit (Condensed Graph of Reaction) or implicit (reaction fingerprints) reaction center labeling.

[1]  G Marcou,et al.  QSPR Approach to Predict Nonadditive Properties of Mixtures. Application to Bubble Point Temperatures of Binary Mixtures of Liquids , 2012, Molecular informatics.

[2]  Qing-You Zhang,et al.  Structure-Based Classification of Chemical Reactions without Assignment of Reaction Centers , 2005, J. Chem. Inf. Model..

[3]  Igor I. Baskin,et al.  Prediction of the preferable mechanism of nucleophilic substitution at saturated carbon atom and prognosis of SN1 rate constants by means of QSPR , 2011 .

[4]  Javier Catalán,et al.  A Generalized Solvent Acidity Scale: The Solvatochromism of o-tert-Butylstilbazolium Betaine Dye and Its Homomorph o,o′-Di-tert-butylstilbazolium Betaine Dye† , 1997 .

[5]  Javier Catalán,et al.  Progress towards a generalized solvent polarity scale: The solvatochromism of 2‐(dimethylamino)‐7‐nitrofluorene and its homomorph 2‐fluoro‐7‐nitrofluorene , 1995 .

[6]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR Studies , 2003, J. Chem. Inf. Comput. Sci..

[7]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[8]  Igor I. Baskin,et al.  Development of “structure-property” models in nucleophilic substitution reactions involving azides , 2014, Journal of Structural Chemistry.

[9]  Daniel M. Lowe,et al.  Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity , 2015, J. Chem. Inf. Model..

[10]  R. Taft,et al.  The solvatochromic comparison method. I. The .beta.-scale of solvent hydrogen-bond acceptor (HBA) basicities , 1976 .

[11]  Johann Gasteiger,et al.  Prediction of pKa Values for Aliphatic Carboxylic Acids and Alcohols with Empirical Atomic Charge Descriptors. , 2007 .

[12]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[13]  R. Taft,et al.  The solvatochromic comparison method. 2. The .alpha.-scale of solvent hydrogen-bond donor (HBD) acidities , 1976 .

[14]  Nicolas Lachiche,et al.  A Representation to Apply Usual Data Mining Techniques to Chemical reactions - Illustration on the Rate Constant of SN2 reactions in water , 2011, Int. J. Artif. Intell. Tools.

[15]  Jean-Loup Faulon,et al.  Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor , 2008 .

[16]  Gilles Marcou,et al.  Mining Chemical Reactions Using Neighborhood Behavior and Condensed Graphs of Reactions Approaches , 2012, J. Chem. Inf. Model..

[17]  R. Taft,et al.  The solvatochromic comparison method. 6. The .pi.* scale of solvent polarities , 1977 .

[18]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[19]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[20]  Майре koostaja Тамме Таблицы констант скорости и равновесия гетеролитических органических реакций. Доп. том 4, выпуск 1 = Tables of rate and equilibrium constants of heterolytic organic reactions , 1989 .

[21]  V. A. Palyulin,et al.  Prediction of rate constants of SN2 reactions by the multicomponent QSPR method , 2011 .

[22]  David Z. Chen,et al.  Automatic reaction mapping and reaction center detection , 2013 .

[23]  Pavel G. Polischuk,et al.  Hierarchic system of QSAR models (1D–4D) on the base of simplex representation of molecular structure , 2005, Journal of molecular modeling.

[24]  Lars Ridder,et al.  SyGMa: Combining Expert Knowledge and Empirical Scoring in the Prediction of Metabolites , 2008, ChemMedChem.

[25]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[26]  Alexandre Varnek,et al.  Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures , 2005, J. Comput. Aided Mol. Des..

[27]  Victor Kuzmin,et al.  Hierarchical QSAR technology based on the Simplex representation of molecular structure , 2008, J. Comput. Aided Mol. Des..

[28]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[29]  Valerie J. Gillet,et al.  Knowledge-Based Approach to de Novo Design Using Reaction Vectors , 2009, J. Chem. Inf. Model..

[30]  Javier Catalán,et al.  A Generalized Solvent Basicity Scale: The Solvatochromism of 5‐Nitroindoline and Its Homomorph 1‐Methyl‐5‐nitroindoline , 1996 .

[31]  Timur I. Madzhidov,et al.  Structure–reactivity relationship in bimolecular elimination reactions based on the condensed graph of a reaction , 2015, Journal of Structural Chemistry.

[32]  Dragos Horvath,et al.  Expert System for Predicting Reaction Conditions: The Michael Reaction Case , 2015, J. Chem. Inf. Model..

[33]  Johann Gasteiger,et al.  Computer-assisted prediction of the degradation of chemicals: hydrolysis of amides and benzoylphenylureas , 1995 .

[34]  Igor I. Baskin,et al.  Structure-reactivity relationships in terms of the condensed graphs of reactions , 2014, Russian Journal of Organic Chemistry.

[35]  Dragos Horvath,et al.  Models for Identification of Erroneous Atom-to-Atom Mapping of Reactions Performed by Automated Algorithms , 2012, J. Chem. Inf. Model..