Pros and cons of virtual screening based on public "Big Data": In silico mining for new bromodomain inhibitors.

The Virtual Screening (VS) study described herein aimed at detecting novel Bromodomain BRD4 binders and relied on knowledge from public databases (ChEMBL, REAXYS) to establish a battery of predictive models of BRD activity for in silico selection of putative ligands. Beyond the actual discovery of new BRD ligands, this represented an opportunity to practically estimate the actual usefulness of public domain "Big Data" for robust predictive model building. Obtained models were used to virtually screen a collection of 2 million compounds from the Enamine company collection. This industrial partner then experimentally screened a subset of 2992 molecules selected by the VS procedure for their high likelihood to be active. Twenty nine confirmed hits were detected after experimental testing, representing 1% of the selected candidates. As a general conclusion, this study emphasizes once more that public structure-activity databases are nowadays key assets in drug discovery. Their usefulness is however limited by the state-of-the-art knowledge harvested so far by published studies. Target-specific structure-activity information is rarely rich enough, and its heterogeneity makes it extremely difficult to exploit in rational drug design. Furthermore, published affinity measures serving to build models selecting compounds to be experimentally screened may not be well correlated with the experimental hit selection criterion (in practice, often imposed by equipment constraints). Nevertheless, a robust 2.6-fold increase in hit rate with respect to an equivalent, random screening campaign showed that machine learning is able to extract some real knowledge in spite of all the noise in structure-activity data.

[1]  Thomas C. Kaufman,et al.  brahma: A regulator of Drosophila homeotic genes structurally related to the yeast transcriptional activator SNF2 SWI2 , 1992, Cell.

[2]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[3]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[4]  Gebräuchliche Fertigarzneimittel,et al.  V , 1893, Therapielexikon Neurologie.

[5]  K. Jones,et al.  The multi-tasking P-TEFb complex. , 2008, Current opinion in cell biology.

[6]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[7]  Dragos Horvath,et al.  AntiMalarial Mode of Action (AMMA) Database: Data Selection, Verification and Chemical Space Analysis , 2018, Molecular informatics.

[8]  Gilles Marcou,et al.  Rescoring of docking poses under Occam’s Razor: are there simpler solutions? , 2018, Journal of Computer-Aided Molecular Design.

[9]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[10]  Dragos Horvath,et al.  Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds , 2015, Journal of Computer-Aided Molecular Design.

[11]  Gilles Marcou,et al.  An Evolutionary Optimizer of libsvm Models , 2014 .

[12]  Jeroen Krijgsveld,et al.  Cooperative binding of two acetylation marks on a histone tail by a single bromodomain , 2009, Nature.

[13]  Dragos Horvath,et al.  Chemical Data Visualization and Analysis with Incremental Generative Topographic Mapping: Big Data Challenge , 2015, J. Chem. Inf. Model..

[14]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[15]  Héléna A. Gaspar,et al.  Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure‐Activity Modeling and Dataset Comparison , 2012, Molecular informatics.

[16]  Dragos Horvath,et al.  S4MPLE—Sampler for Multiple Protein-Ligand Entities: Methodology and Rigid-Site Docking Benchmarking , 2015, Molecules.

[17]  Héléna A. Gaspar,et al.  GTM‐Based QSAR Models and Their Applicability Domains , 2015, Molecular informatics.

[18]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[19]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[20]  R. Siezen,et al.  others , 1999, Microbial Biotechnology.

[21]  Shwu‐Yuan Wu,et al.  The Double Bromodomain-containing Chromatin Adaptor Brd4 and Transcriptional Regulation* , 2007, Journal of Biological Chemistry.

[22]  P. Roche,et al.  Chemistry‐driven Hit‐to‐lead Optimization Guided by Structure‐based Approaches , 2018, Molecular informatics.

[23]  Dragos Horvath,et al.  S4MPLE - Sampler For Multiple Protein-Ligand Entities: Simultaneous Docking of Several Entities , 2013, J. Chem. Inf. Model..

[24]  Yurii S. Moroz,et al.  Straightforward hit identification approach in fragment-based discovery of bromodomain-containing protein 4 (BRD4) inhibitors. , 2018, Bioorganic & Medicinal Chemistry.