From bird’s eye views to molecular communities: two-layered visualization of structure–activity relationships in large compound data sets

The analysis of structure–activity relationships (SARs) becomes rather challenging when large and heterogeneous compound data sets are studied. In such cases, many different compounds and their activities need to be compared, which quickly goes beyond the capacity of subjective assessments. For a comprehensive large-scale exploration of SARs, computational analysis and visualization methods are required. Herein, we introduce a two-layered SAR visualization scheme specifically designed for increasingly large compound data sets. The approach combines a new compound pair-based variant of generative topographic mapping (GTM), a machine learning approach for nonlinear mapping, with chemical space networks (CSNs). The GTM component provides a global view of the activity landscapes of large compound data sets, in which informative local SAR environments are identified, augmented by a numerical SAR scoring scheme. Prioritized local SAR regions are then projected into CSNs that resolve these regions at the level of individual compounds and their relationships. Analysis of CSNs makes it possible to distinguish between regions having different SAR characteristics and select compound subsets that are rich in SAR information.

[1]  J. Bajorath,et al.  Activity landscape representations for structure-activity relationship analysis. , 2010, Journal of medicinal chemistry.

[2]  James R. Brown,et al.  Thousands of chemical starting points for antimalarial lead identification , 2010, Nature.

[3]  Gilles Marcou,et al.  An Evolutionary Optimizer of libsvm Models , 2014 .

[4]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[5]  Héléna A. Gaspar,et al.  Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure‐Activity Modeling and Dataset Comparison , 2012, Molecular informatics.

[6]  Igor I. Baskin,et al.  Stargate GTM: Bridging Descriptor and Activity Spaces , 2015, J. Chem. Inf. Model..

[7]  C. Wermuth,et al.  Comprar The Practice of Medicinal Chemistry, 3rd Edition | Camille G. Wermuth | 9780123741943 | Academic Press , 2009 .

[8]  J. Bajorath,et al.  Extracting SAR Information from a Large Collection of Anti-Malarial Screening Hits by NSG-SPT Analysis. , 2011, ACS medicinal chemistry letters.

[9]  Jürgen Bajorath,et al.  Comparison of bioactive chemical space networks generated using substructure- and fingerprint-based measures of molecular similarity , 2015, Journal of Computer-Aided Molecular Design.

[10]  Douglas P. Wiens,et al.  MATCH - A Software Package for Robust Profile Matching Using S-Plus , 2004 .

[11]  Jürgen Bajorath,et al.  Recent progress in understanding activity cliffs and their utility in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[12]  Daniel J. Warner,et al.  Matched molecular pairs as a medicinal chemistry tool. , 2011, Journal of medicinal chemistry.

[13]  Anne Mai Wassermann,et al.  SARANEA: A Freely Available Program To Mine Structure-Activity and Structure-Selectivity Relationship Information in Compound Data Sets , 2010, J. Chem. Inf. Model..

[14]  Jürgen Bajorath,et al.  MMP-Cliffs: Systematic Identification of Activity Cliffs on the Basis of Matched Molecular Pairs , 2012, J. Chem. Inf. Model..

[15]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[16]  J. Bajorath,et al.  SAR index: quantifying the nature of structure-activity relationships. , 2007, Journal of medicinal chemistry.

[17]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[18]  Jürgen Bajorath,et al.  Methods for SAR visualization , 2012 .

[19]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[20]  Jürgen Bajorath,et al.  Lessons learned from the design of chemical space networks and opportunities for new applications , 2016, Journal of Computer-Aided Molecular Design.

[21]  José L. Medina-Franco,et al.  Characterization of Activity Landscapes Using 2D and 3D Similarity Methods: Consensus Activity Cliffs , 2009, J. Chem. Inf. Model..

[22]  Héléna A. Gaspar,et al.  GTM‐Based QSAR Models and Their Applicability Domains , 2015, Molecular informatics.

[23]  G Marcou,et al.  QSPR Approach to Predict Nonadditive Properties of Mixtures. Application to Bubble Point Temperatures of Binary Mixtures of Liquids , 2012, Molecular informatics.

[24]  Dragos Horvath,et al.  Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds , 2015, Journal of Computer-Aided Molecular Design.

[25]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[26]  Gilles Marcou,et al.  Chemical Space Mapping and Structure-Activity Analysis of the ChEMBL Antiviral Compound Set , 2016, J. Chem. Inf. Model..

[27]  Jürgen Bajorath,et al.  Chemical space networks: a powerful new paradigm for the description of chemical space , 2014, Journal of Computer-Aided Molecular Design.

[28]  Wolfgang Guba,et al.  Neighborhood-preserving visualization of adaptive structure-activity landscapes: application to drug discovery. , 2011, Angewandte Chemie.

[29]  Jürgen Bajorath,et al.  Exploring activity cliffs in medicinal chemistry. , 2012, Journal of medicinal chemistry.

[30]  Jens Sadowski,et al.  Structure Modification in Chemical Databases , 2005 .

[31]  Andrew T Maynard,et al.  Quantifying, Visualizing, and Monitoring Lead Optimization. , 2016, Journal of medicinal chemistry.

[32]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[33]  Padhraic Smyth,et al.  Analysis and Visualization of Network Data using JUNG , 2005 .

[34]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[35]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[36]  Jameed Hussain,et al.  Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets , 2010, J. Chem. Inf. Model..

[37]  Jürgen Bajorath,et al.  Recent developments in SAR visualization , 2016 .

[38]  Igor V. Tetko,et al.  Associative Neural Network , 2002, Neural Processing Letters.

[39]  J. Bajorath,et al.  Systematic computational analysis of structure-activity relationships: concepts, challenges and recent advances. , 2009, Future medicinal chemistry.

[40]  Alban Arrault,et al.  Generative Topographic Mapping-Based Classification Models and Their Applicability Domain: Application to the Biopharmaceutics Drug Disposition Classification System (BDDCS) , 2013, J. Chem. Inf. Model..