Quantifying the Fingerprint Descriptor Dependence of Structure-Activity Relationship Information on a Large Scale

It is well-known that different molecular representations, e.g., graphs, numerical descriptors, fingerprints, or 3D models, change the numerical results of molecular similarity calculations. Because the assessment of structure-activity relationships (SARs) requires similarity and potency comparisons of active compounds, this representation dependence inevitably also affects SAR analysis. But to what extent? How exactly does SAR information change when alternative fingerprints are used as descriptors? What is the proportion of active compounds with substantial changes in SAR information induced by different fingerprints? To provide answers to these questions, we have quantified changes in SAR information across many different compound classes using six different fingerprints. SAR profiling was carried out on 128 target-based data sets comprising more than 60,000 compounds with high-confidence activity annotations. A numerical measure of SAR discontinuity was applied to assess SAR information on a per compound basis. For ~70% of all test compounds, changes in SAR characteristics were detected when different fingerprints were used as molecular representations. Moreover, the SAR phenotype of ~30% of the compounds changed, and distinct fingerprint-dependent local SAR environments were detected. The fingerprints we compared were found to generate SAR models that were essentially not comparable. Atom environment and pharmacophore fingerprints produced the largest differences in compound-associated SAR information. Taken together, the results of our systematic analysis reveal larger fingerprint-dependent changes in compound-associated SAR information than would have been anticipated.

[1]  Jürgen Bajorath,et al.  From Structure–Activity to Structure–Selectivity Relationships: Quantitative Assessment, Selectivity Cliffs, and Key Compounds , 2009, ChemMedChem.

[2]  J. Bajorath,et al.  Data structures and computational tools for the extraction of SAR information from large compound sets. , 2010, Drug discovery today.

[3]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[4]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[5]  Rajarshi Guha,et al.  Structure-Activity Landscape Index: Identifying and Quantifying Activity Cliffs , 2008, J. Chem. Inf. Model..

[6]  Jürgen Bajorath,et al.  Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. , 2007, Drug discovery today.

[7]  A. Hopfinger,et al.  Methods for applying the quantitative structure-activity relationship paradigm. , 2004, Methods in molecular biology.

[8]  José L. Medina-Franco,et al.  Characterization of Activity Landscapes Using 2D and 3D Similarity Methods: Consensus Activity Cliffs , 2009, J. Chem. Inf. Model..

[9]  Jürgen Bajorath,et al.  Exploring activity cliffs in medicinal chemistry. , 2012, Journal of medicinal chemistry.

[10]  J. Bajorath,et al.  Activity landscape representations for structure-activity relationship analysis. , 2010, Journal of medicinal chemistry.

[11]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[12]  Jürgen Bajorath,et al.  Methods for SAR visualization , 2012 .

[13]  J. Bajorath,et al.  Structure-activity relationship anatomy by network-like similarity graphs and local structure-activity relationship indices. , 2008, Journal of medicinal chemistry.

[14]  J. Bajorath,et al.  Scaffold hopping using two-dimensional fingerprints: true potential, black magic, or a hopeless endeavor? Guidelines for virtual screening. , 2010, Journal of medicinal chemistry.

[15]  J. Bajorath,et al.  SAR index: quantifying the nature of structure-activity relationships. , 2007, Journal of medicinal chemistry.

[16]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[17]  P. Willett Searching techniques for databases of two- and three-dimensional chemical structures. , 2005, Journal of medicinal chemistry.

[18]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..