A Similarity‐based Data‐fusion Approach to the Visual Characterization and Comparison of Compound Databases

A low‐dimensional method, based on the use of multiple fusion‐based similarity measures, is described for graphically depicting and characterizing relationships among molecules in compound databases. The measures are used to construct multi‐fusion similarity maps that characterize the relationship of a set of ‘test’ molecules to a set of ‘reference’ molecules. The reference set is very general and can be made of molecules from, for example, the set of test molecules itself (the self‐referencing case), from a small library or large compound collection, or from actives in a given assay or group of assays. The test set is any collection of compounds to be analyzed with respect to the specified reference set. Multiple fusion similarity measures tend to provide more information than single fusion‐based measures, including information on the nature of the chemical‐space neighborhoods surrounding reference‐set molecules. A general discussion is presented on how to interpret multi‐fusion similarity maps, and several examples are given that illustrate how these maps can be used to compare compound libraries or collections, to select compounds for screening or acquisition, and to identify new active molecules using ligand‐based virtual screening.

[1]  Michael S. Lajiness,et al.  A Practical Strategy for Directed Compound Acquisition , 2005 .

[2]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[3]  Mark L. Brewer,et al.  Development of a Spectral Clustering Method for the Analysis of Molecular Data Sets , 2007, J. Chem. Inf. Model..

[4]  Gisbert Schneider,et al.  A Hierarchical Clustering Approach for Large Compound Libraries , 2005, J. Chem. Inf. Model..

[5]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[6]  F. Lootsma Multi-Criteria Decision Analysis via Ratio and Difference Judgement , 1999 .

[7]  Gisbert Schneider,et al.  NIPALSTREE: A New Hierarchical Clustering Approach for Large Compound Libraries and Its Application to Virtual Screening , 2006, J. Chem. Inf. Model..

[8]  Jürgen Bajorath,et al.  Accurate Partitioning of Compounds Belonging to Diverse Activity Classes , 2002, J. Chem. Inf. Comput. Sci..

[9]  P. Willett,et al.  Combination of molecular similarity measures using data fusion , 2000 .

[10]  Jürgen Bajorath,et al.  Comparison of 2D Fingerprint Methods for Multiple‐Template Similarity Searching on Compound Activity Classes of Increasing Structural Diversity , 2007, ChemMedChem.

[11]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[12]  Pierre Acklin,et al.  Similarity Metrics for Ligands Reflecting the Similarity of the Target Proteins , 2003, J. Chem. Inf. Comput. Sci..

[13]  Jürgen Bajorath,et al.  Fingerprint Scaling Increases the Probability of Identifying Molecules with Similar Activity in Virtual Screening Calculations , 2001, J. Chem. Inf. Comput. Sci..

[14]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[15]  D. L. Hall,et al.  Mathematical Techniques in Multisensor Data Fusion , 1992 .

[16]  P. Willett,et al.  Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbor information. , 2005, Journal of medicinal chemistry.

[17]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[18]  Robert P. Sheridan,et al.  Chemical Similarity Using Physiochemical Property Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[19]  Tudor I. Oprea,et al.  Chemography: the Art of Navigating in Chemical Space , 2000 .

[20]  M. Lajiness Dissimilarity-based compound selection techniques , 1996 .

[21]  Dimitris K. Agrafiotis,et al.  A Fractal Approach for Selecting an Appropriate Bin Size for Cell-Based Diversity Estimation , 2002, J. Chem. Inf. Comput. Sci..

[22]  Sonya A. H. McMullen,et al.  Mathematical Techniques in Multisensor Data Fusion (Artech House Information Warfare Library) , 2004 .

[23]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[24]  Dimitris K. Agrafiotis,et al.  Stochastic proximity embedding , 2003, J. Comput. Chem..

[25]  V. J. Gillet Applications of evolution computation in drug design , 2004 .

[26]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[27]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[28]  Lawrence A. Klein,et al.  Sensor and Data Fusion: A Tool for Information Assessment and Decision Making , 2004 .

[29]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[30]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[31]  Mahdi Mahfouf,et al.  Clustering Files of Chemical Structures Using the Fuzzy k-Means Clustering Method , 2004, J. Chem. Inf. Model..

[32]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[33]  Sun Choi,et al.  Balancing focused combinatorial libraries based on multiple GPCR ligands , 2006, J. Comput. Aided Mol. Des..

[34]  Jason A. Rush Cell-Based Methods for Sampling in High-Dimensional Spaces , 1999 .

[35]  Peter Willett,et al.  Designing focused libraries using MoSELECT. , 2002, Journal of molecular graphics & modelling.

[36]  Thomas Scior,et al.  Large compound databases for structure-activity relationships studies in drug discovery. , 2007, Mini reviews in medicinal chemistry.

[37]  Chris Williams,et al.  Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance , 2006, Molecular Diversity.

[38]  Jérôme Hert,et al.  New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching , 2006, J. Chem. Inf. Model..

[39]  Michael E. Cooper Chemoinformatics: Concepts, Methods and Tools for Drug Discovery , 2004 .

[40]  Tudor I. Oprea,et al.  Chemoinformatics in drug discovery , 2005 .

[41]  Ting Chen,et al.  R-NN Curves: An Intuitive Approach to Outlier Detection Using a Distance Based Method , 2006, J. Chem. Inf. Model..

[42]  Dimitris K. Agrafiotis On the Use of Information Theory for Assessing Molecular Diversity , 1997, J. Chem. Inf. Comput. Sci..

[43]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[44]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[45]  Aixia Yan Application of self-organizing maps in compounds pattern recognition and combinatorial library design. , 2006, Combinatorial chemistry & high throughput screening.

[46]  Peter Willett,et al.  Implementation of nearest-neighbor searching in an online chemical structure search system , 1986, J. Chem. Inf. Comput. Sci..

[47]  Tudor I. Oprea,et al.  WOMBAT: World of Molecular Bioactivity , 2005 .

[48]  John M. Barnard,et al.  Clustering Methods and Their Uses in Computational Chemistry , 2003 .

[49]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[50]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[51]  Pierre Baldi,et al.  Chemoinformatics, drug design, and systems biology. , 2005, Genome informatics. International Conference on Genome Informatics.

[52]  G. Maggiora,et al.  Hit-directed nearest-neighbor searching. , 2005, Journal of medicinal chemistry.

[53]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Chemical Structures: Evaluation of the EVA Descriptor and Combination of Rankings Using Data Fusion , 1997, J. Chem. Inf. Comput. Sci..

[54]  Dimitris K. Agrafiotis,et al.  Nonlinear mapping of massive data sets by fuzzy clustering and neural networks , 2001, J. Comput. Chem..

[55]  K. M. Smith,et al.  Novel software tools for chemical diversity , 1998 .

[56]  P. Willett Searching techniques for databases of two- and three-dimensional chemical structures. , 2005, Journal of medicinal chemistry.

[57]  Ting Chen,et al.  Scalable Partitioning and Exploration of Chemical Spaces Using Geometric Hashing , 2006, J. Chem. Inf. Model..

[58]  Jürgen Bajorath,et al.  Cell-based partitioning. , 2004, Methods in molecular biology.

[59]  Jürgen Bajorath,et al.  Chemoinformatics : concepts, methods, and tools for drug discovery , 2004 .