Machine Learning Distinguishes with High Accuracy between Pan-Assay Interference Compounds That Are Promiscuous or Represent Dark Chemical Matter.

Assay interference compounds give rise to false-positives and cause substantial problems in medicinal chemistry. Nearly 500 compound classes have been designated as pan-assay interference compounds (PAINS), which typically occur as substructures in other molecules. The structural environment of PAINS substructures is likely to play an important role for their potential reactivity. Given the large number of PAINS and their highly variable structural contexts, it is difficult to study context dependence on the basis of expert knowledge. Hence, we applied machine learning to predict PAINS that are promiscuous and distinguish them from others that are mostly inactive. Surprisingly accurate models can be derived using different methods such as support vector machines, random forests, or deep neural networks. Moreover, structural features that favor correct predictions have been identified, mapped, and categorized, shedding light on the structural context dependence of PAINS effects. The machine learning models presented herein further extend the capacity of PAINS filters.

[1]  B. Shoichet Screening in a spirit haunted world. , 2006, Drug discovery today.

[2]  J. Baell,et al.  New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. , 2010, Journal of medicinal chemistry.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Peter Wipf,et al.  Profiling the NIH Small Molecule Repository for compounds that generate H2O2 by redox cycling in reducing environments. , 2010, Assay and drug development technologies.

[5]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[6]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[7]  Jayme L. Dahlin,et al.  The Essential Medicinal Chemistry of Curcumin , 2017, Journal of medicinal chemistry.

[8]  Erin E. Carlson,et al.  Chemical probes of UDP-galactopyranose mutase. , 2006, Chemistry & biology.

[9]  T. Tomašič,et al.  Rhodanine as a privileged scaffold in drug discovery. , 2009, Current medicinal chemistry.

[10]  Jürgen Bajorath,et al.  How Frequently Are Pan-Assay Interference Compounds Active? Large-Scale Analysis of Screening Data Reveals Diverse Activity Profiles, Low Global Hit Frequency, and Many Consistently Inactive Compounds. , 2017, Journal of medicinal chemistry.

[11]  C. Eyermann,et al.  High-Throughput Identification of Promiscuous Inhibitors from Screening Libraries with the Use of a Thiol-Containing Fluorescent Probe , 2013, Journal of biomolecular screening.

[12]  Robert Preissner,et al.  Exploring Activity Profiles of PAINS and Their Structural Context in Target-Ligand Complexes , 2018, J. Chem. Inf. Model..

[13]  Thomas Mendgen,et al.  Privileged scaffolds or promiscuous binders: a comparative study on rhodanines and related heterocycles in medicinal chemistry. , 2012, Journal of medicinal chemistry.

[14]  Christopher P Austin,et al.  High-throughput screening assays for the identification of chemical probes. , 2007, Nature chemical biology.

[15]  James Inglese,et al.  Apparent activity in high-throughput screening: origins of compound-dependent assay interference. , 2010, Current opinion in chemical biology.

[16]  Jürgen Bajorath,et al.  Highly Promiscuous Small Molecules from Biological Screening Assays Include Many Pan-Assay Interference Compounds but Also Candidates for Polypharmacology. , 2016, Journal of medicinal chemistry.

[17]  J. Bajorath,et al.  X-ray Structures of Target-Ligand Complexes Containing Compounds with Assay Interference Potential. , 2018, Journal of medicinal chemistry.

[18]  J. Baell,et al.  Chemistry: Chemical con artists foil drug discovery , 2014, Nature.

[19]  Alexander Tropsha,et al.  Phantom PAINS: Problems with the Utility of Alerts for Pan-Assay INterference CompoundS , 2017, J. Chem. Inf. Model..

[20]  Jonathan B Baell,et al.  Feeling Nature's PAINS: Natural Products, Natural Product Drugs, and Pan Assay Interference Compounds (PAINS). , 2016, Journal of natural products.

[21]  J Willem M Nissink,et al.  Seven Year Itch: Pan-Assay Interference Compounds (PAINS) in 2017—Utility and Limitations , 2017, ACS chemical biology.

[22]  J. Irwin,et al.  An Aggregation Advisor for Ligand Discovery. , 2015, Journal of medicinal chemistry.

[23]  Jeffrey R. Huth,et al.  Enhancement of chemical rules for predicting compound reactivity towards protein thiol groups , 2007, J. Comput. Aided Mol. Des..

[24]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[25]  B. Shoichet,et al.  A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. , 2002, Journal of medicinal chemistry.

[26]  Jayme L. Dahlin,et al.  PAINS in the Assay: Chemical Mechanisms of Assay Interference and Promiscuous Enzymatic Inhibition Observed during a Sulfhydryl-Scavenging HTS , 2015, Journal of medicinal chemistry.

[27]  Jürgen Bajorath,et al.  Visualization and Interpretation of Support Vector Machine Activity Predictions , 2015, J. Chem. Inf. Model..

[28]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[29]  Nikhil Ketkar,et al.  Introduction to PyTorch , 2021, Deep Learning with Python.

[30]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[31]  Yanli Wang,et al.  PubChem BioAssay: 2017 update , 2016, Nucleic Acids Res..

[32]  Jürgen Bajorath,et al.  Determining the Degree of Promiscuity of Extensively Assayed Compounds , 2016, PloS one.

[33]  Shaomeng Wang,et al.  The Ecstasy and Agony of Assay Interference Compounds. , 2017, Journal of chemical information and modeling.

[34]  Anne Mai Wassermann,et al.  Dark chemical matter as a promising starting point for drug lead discovery. , 2015, Nature chemical biology.

[35]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[36]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..