Are the Chemical Structures in Your QSAR Correct

Quantitative structure–activity relationships (QSARs) are used to predict many different endpoints, utilize hundreds, and even thousands of different parameters (or descriptors), and are created using a variety of approaches. The one thing they all have in common is the assumption that the chemical structures used are correct. This research investigates this assumption by examining six public and private databases that contain structural information for chemicals. Molecular fingerprinting techniques are used to determine the error rates for structures in each of the databases. It was observed that the databases had error rates ranging from 0.1 to 3.4%. A case study to predict the n-octanol/water partition coefficient was also investigated to highlight the effects of these errors in the predictions of QSARs. In this case study, QSARs were developed using both (i) all correct structures and (ii) structures from a database with an error rate of 3.4%. This case study showed how slight errors in chemical structures, such as misplacing a Cl atom or swapping hydroxy and methoxy functional groups on a multiple ring structure, can result in significant differences in the accuracy of the prediction for those chemicals.

[1]  Thierry Hanser,et al.  Machine learning of generic reactions: 3. an efficient algorithm for maximal common substructure determination , 1990 .

[2]  S C Basak,et al.  Use of graph theoretic parameters in risk assessment of chemicals. , 1995, Toxicology letters.

[3]  S. Hsu,et al.  A review of thalidomide's history and current dermatological applications. , 2003, Dermatology online journal.

[4]  Paola Gramatica,et al.  Introduction General Considerations , 2022 .

[5]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[6]  Tomasz Arodz,et al.  Computational methods in developing quantitative structure-activity relationships (QSAR): a review. , 2006, Combinatorial chemistry & high throughput screening.

[7]  Robert Combes,et al.  Proposed Integrated Decision-tree Testing Strategies for Mutagenicity and Carcinogenicity in Relation to the EU REACH Legislation , 2007, Alternatives to laboratory animals : ATLA.

[8]  Robert J Kavlock,et al.  Computational toxicology--a state of the science mini review. , 2008, Toxicological sciences : an official journal of the Society of Toxicology.

[9]  Robert Combes,et al.  Integrated Decision-tree Testing Strategies for Environmental Toxicity with Respect to the Requirements of the EU REACH Legislation , 2006, Alternatives to laboratory animals : ATLA.

[10]  Robert Combes,et al.  Integrated Decision-tree Testing Strategies for Developmental and Reproductive Toxicity with Respect to the Requirements of the EU REACH Legislation , 2008, Alternatives to laboratory animals : ATLA.

[11]  Paul Harten,et al.  A Hierarchical Clustering Methodology for the Estimation of Toxicity , 2008, Toxicology mechanisms and methods.