Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions

Nowadays, the problem of the model’s applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models’ performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several “best” AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.

[1]  Connor W. Coley,et al.  Machine Learning in Computer-Aided Synthesis Planning. , 2018, Accounts of chemical research.

[2]  Roberto Todeschini,et al.  Comparison of Different Approaches to Define the Applicability Domain of QSAR Models , 2012, Molecules.

[3]  Igor I. Baskin,et al.  Development of “structure-property” models in nucleophilic substitution reactions involving azides , 2014, Journal of Structural Chemistry.

[4]  Igor I Baskin,et al.  The One‐Class Classification Approach to Data Description and to Models Applicability Domain , 2010, Molecular informatics.

[5]  G. Mangiatordi,et al.  Applicability Domain for QSAR models: where theory meets reality , 2016 .

[6]  Andreas Zell,et al.  Estimation of the applicability domain of kernel-based machine learning models for virtual screening , 2010, J. Cheminformatics.

[7]  Alexandre Varnek,et al.  CGRtools: Python Library for Molecule, Reaction, and Condensed Graph of Reaction Processing , 2019, J. Chem. Inf. Model..

[8]  V. A. Palyulin,et al.  Prediction of rate constants of SN2 reactions by the multicomponent QSPR method , 2011 .

[9]  Igor I. Baskin,et al.  Prediction of the preferable mechanism of nucleophilic substitution at saturated carbon atom and prognosis of SN1 rate constants by means of QSPR , 2011 .

[10]  Nicolas Lachiche,et al.  A Representation to Apply Usual Data Mining Techniques to Chemical reactions - Illustration on the Rate Constant of SN2 reactions in water , 2011, Int. J. Artif. Intell. Tools.

[11]  Alexandre Varnek,et al.  Structure–reactivity modeling using mixture-based representation of chemical reactions , 2017, Journal of Computer-Aided Molecular Design.

[12]  R. Taft,et al.  The solvatochromic comparison method. 6. The .pi.* scale of solvent polarities , 1977 .

[13]  Gilles Marcou,et al.  A unified approach to the applicability domain problem of QSAR models , 2010, J. Cheminformatics.

[14]  Igor I. Baskin,et al.  Structure–reactivity relationship in Diels–Alder reactions obtained using the condensed reaction graph approach , 2017, Journal of Structural Chemistry.

[15]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[16]  Igor I. Baskin,et al.  Artificial intelligence in synthetic chemistry: achievements and prospects , 2017 .

[17]  Igor I. Baskin,et al.  Structure-reactivity relationships in terms of the condensed graphs of reactions , 2014, Russian Journal of Organic Chemistry.

[18]  Javier Catalán,et al.  A Generalized Solvent Acidity Scale: The Solvatochromism of o-tert-Butylstilbazolium Betaine Dye and Its Homomorph o,o′-Di-tert-butylstilbazolium Betaine Dye† , 1997 .

[19]  Igor I. Baskin,et al.  Assessment of tautomer distribution using the condensed reaction graph approach , 2018, Journal of Computer-Aided Molecular Design.

[20]  Igor V. Tetko,et al.  Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set , 2010, J. Chem. Inf. Model..

[21]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[22]  Javier Catalán,et al.  Progress towards a generalized solvent polarity scale: The solvatochromism of 2‐(dimethylamino)‐7‐nitrofluorene and its homomorph 2‐fluoro‐7‐nitrofluorene , 1995 .

[23]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[24]  R. Taft,et al.  The solvatochromic comparison method. I. The .beta.-scale of solvent hydrogen-bond acceptor (HBA) basicities , 1976 .

[25]  Alexandre Varnek,et al.  Bimolecular Nucleophilic Substitution Reactions: Predictive Models for Rate Constants and Molecular Reaction Pairs Analysis , 2018, Molecular informatics.

[26]  K. Baumann,et al.  Chemoinformatic Classification Methods and their Applicability Domain , 2016, Molecular informatics.

[27]  R. Taft,et al.  The solvatochromic comparison method. 2. The .alpha.-scale of solvent hydrogen-bond donor (HBD) acidities , 1976 .

[28]  Miriam Mathea,et al.  Efficiency of different measures for defining the applicability domain of classification models , 2017, Journal of Cheminformatics.

[29]  Alexandre Varnek,et al.  Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures , 2005, J. Comput. Aided Mol. Des..

[30]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[31]  Ola Engkvist,et al.  Computational prediction of chemical reactions: current status and outlook. , 2018, Drug discovery today.

[32]  William H. Green,et al.  Using Machine Learning To Predict Suitable Conditions for Organic Reactions , 2018, ACS central science.

[33]  C Barber,et al.  Applicability domain: towards a more formal definition$ , 2016, SAR and QSAR in environmental research.

[34]  Timur I. Madzhidov,et al.  Structure–reactivity relationship in bimolecular elimination reactions based on the condensed graph of a reaction , 2015, Journal of Structural Chemistry.

[35]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[36]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .