Similarity-Potency Trees: A Method to Search for SAR Information in Compound Data Sets and Derive SAR Rules

An intuitive and generally applicable analysis method, termed similarity-potency tree (SPT), is introduced to mine structure-activity relationship (SAR) information in compound data sets of any source. Only compound potency values and nearest-neighbor similarity relationships are considered. Rather than analyzing a data set as a whole, in part overlapping compound neighborhoods are systematically generated and represented as SPTs. This local analysis scheme simplifies the evaluation of SAR information and SPTs of high SAR information content are easily identified. By inspecting only a limited number of compound neighborhoods, it is also straightforward to determine whether data sets contain only little or no interpretable SAR information. Interactive analysis of SPTs is facilitated by reading the trees in two directions, which makes it possible to extract SAR rules, if available, in a consistent manner. The simplicity and interpretability of the data structure and the ease of calculation are characteristic features of this approach. We apply the methodology to high-throughput screening and lead optimization data sets, compare the approach to standard clustering techniques, illustrate how SAR rules are derived, and provide some practical guidance how to best utilize the methodology. The SPT program is made freely available to the scientific community.

[1]  J. Bajorath,et al.  Structure-activity relationship anatomy by network-like similarity graphs and local structure-activity relationship indices. , 2008, Journal of medicinal chemistry.

[2]  Ahlberg Visual exploration of HTS databases: bridging the gap between chemistry and biology. , 1999, Drug discovery today.

[3]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[4]  Sung Jin Cho,et al.  Visual exploration of structure–activity relationship using maximum common framework , 2008, J. Comput. Aided Mol. Des..

[5]  Dimitris K Agrafiotis,et al.  SAR maps: a new SAR visualization technique for medicinal chemists. , 2007, Journal of medicinal chemistry.

[6]  Glenn J. Myatt,et al.  LeadScope: Software for Exploring Large Sets of Screening Data , 2000, J. Chem. Inf. Comput. Sci..

[7]  Andreas Sewing,et al.  Evaluating Real-Life High-Throughput Screening Data , 2005, Journal of biomolecular screening.

[8]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[9]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[10]  Rajarshi Guha,et al.  Structure—Activity Landscape Index: Identifying and Quantifying Activity Cliffs. , 2008 .

[11]  Harald Mauser,et al.  Database Clustering with a Combination of Fingerprint and Maximum Common Substructure Methods. , 2005 .

[12]  Alexander Böcker,et al.  Toward an Improved Clustering of Large Data Sets Using Maximum Common Substructures and Topological Fingerprints , 2008, J. Chem. Inf. Model..

[13]  S. Young,et al.  Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning. , 2000 .

[14]  Mathias Wawer,et al.  Computational characterization of SAR microenvironments in high-throughput screening data , 2010 .

[15]  Dimitris K. Agrafiotis,et al.  Enhanced SAR Maps: Expanding the Data Rendering Capabilities of a Popular Medicinal Chemistry Tool , 2009, J. Chem. Inf. Model..

[16]  A. Hopfinger,et al.  Methods for applying the quantitative structure-activity relationship paradigm. , 2004, Methods in molecular biology.

[17]  Robert Nadon,et al.  Statistical practice in high-throughput screening data analysis , 2006, Nature Biotechnology.

[18]  Mathias Wawer,et al.  Elucidation of structure-activity relationship pathways in biological screening data. , 2009, Journal of medicinal chemistry.

[19]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[20]  H. Kubinyi QSAR and 3D QSAR in drug design Part 1: methodology , 1997 .

[21]  Mathias Wawer,et al.  Navigating structure-activity landscapes. , 2009, Drug discovery today.

[22]  Jürgen Bajorath,et al.  Systematic extraction of structure-activity relationship information from biological screening data , 2010, J. Cheminformatics.

[23]  Andrew I Su,et al.  HierS: hierarchical scaffold clustering using topological chemical graphs. , 2005, Journal of medicinal chemistry.

[24]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[25]  J. Bajorath,et al.  SAR index: quantifying the nature of structure-activity relationships. , 2007, Journal of medicinal chemistry.