Similarity Metrics for Ligands Reflecting the Similarity of the Target Proteins

In this study we evaluate how far the scope of similarity searching can be extended to identify not only ligands binding to the same target as the reference ligand(s) but also ligands of other homologous targets without initially known ligands. This "homology-based similarity searching" requires molecular representations reflecting the ability of a molecule to interact with target proteins. The Similog keys, which are introduced here as a new molecular representation, were designed to fulfill such requirements. They are based only on the molecular constitution and are counts of atom triplets. Each triplet is characterized by the graph distances and the types of its atoms. The atom-typing scheme classifies each atom by its function as H-bond donor or acceptor and by its electronegativity and bulkiness. In this study the Similog keys are investigated in retrospective in silico screening experiments and compared with other conformation independent molecular representations. Studied were molecules of the MDDR database for which the activity data was augmented by standardized target classification information from public protein classification databases. The MDDR molecule set was split randomly into two halves. The first half formed the candidate set. Ligands of four targets (dopamine D2 receptor, opioid delta-receptor, factor Xa serine protease, and progesterone receptor) were taken from the second half to form the respective reference sets. Different similarity calculation methods are used to rank the molecules of the candidate set by their similarity to each of the four reference sets. The accumulated counts of molecules binding to the reference target and groups of targets with decreasing homology to it were examined as a function of the similarity rank for each reference set and similarity method. In summary, similarity searching based on Unity 2D-fingerprints or Similog keys are found to be equally effective in the identification of molecules binding to the same target as the reference set. However, the application of the Similog keys is more effective in comparison with the other investigated methods in the identification of ligands binding to any target belonging to the same family as the reference target. We attribute this superiority to the fact that the Similog keys provide a generalization of the chemical elements and that the keys are counted instead of merely noting their presence or absence in a binary form. The second most effective molecular representation are the occurrence counts of the public ISIS key fragments, which like the Similog method, incorporates key counting as well as a generalization of the chemical elements. The results obtained suggest that ligands for a new target can be identified by the following three-step procedure: 1. Select at least one target with known ligands which is homologous to the new target. 2. Combine the known ligands of the selected target(s) to a reference set. 3. Search candidate ligands for the new targets by their similarity to the reference set using the Similog method. This clearly enlarges the scope of similarity searching from the classical application for a single target to the identification of candidate ligands for whole target families and is expected to be of key utility for further systematic chemogenomics exploration of previously well explored target families.

[1]  E J Martin,et al.  Oriented substituent pharmacophore PRopErtY space (OSPPREYS): a substituent-based calculation that describes combinatorial library products better than the corresponding product-based calculation. , 2000, Journal of molecular graphics & modelling.

[2]  John B. O. Mitchell The Relationship between the Sequence Identities of Alpha Helical Proteins in the PDB and the Molecular Similarities of Their Ligands , 2001, J. Chem. Inf. Comput. Sci..

[3]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[4]  Jürgen Bajorath,et al.  Mini-fingerprints Detect Similar Activity of Receptor Ligands Previously Recognized Only by Three-Dimensional Pharmacophore-Based Methods , 2001, J. Chem. Inf. Comput. Sci..

[5]  S. Frye Structure-activity relationship homology (SARAH): a conceptual framework for drug discovery in the genomic era. , 1999, Chemistry & biology.

[6]  Lemont B. Kier,et al.  An Index of Molecular Flexibility from Kappa Shape Attributes , 1989 .

[7]  P. Floersheim,et al.  Isosterism and Bioisosterism Case Studies with Muscarinic Agonists , 1992, CHIMIA.

[8]  I. Gutman,et al.  Graph theory and molecular orbitals. XII. Acyclic polyenes , 1975 .

[9]  Peter Willett,et al.  Effect of standardization on fragment‐based measures of structural similarity , 1993 .

[10]  J. Mason,et al.  New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. , 1999, Journal of medicinal chemistry.

[11]  Ruedi Stoop,et al.  An Ontology for Pharmaceutical Ligands and Its Application for in Silico Screening and Library Design , 2002, J. Chem. Inf. Comput. Sci..

[12]  L. Kier Shape Indexes of Orders One and Three from Molecular Graphs , 1986 .

[13]  P. Willett,et al.  Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. , 2000, Journal of molecular graphics & modelling.

[14]  G. Makara,et al.  Measuring molecular similarity and diversity: total pharmacophore diversity. , 2001, Journal of medicinal chemistry.

[15]  H. Matter,et al.  Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional molecular descriptors. , 1997, Journal of medicinal chemistry.

[16]  A. Balaban Highly discriminating distance-based topological index , 1982 .

[17]  Stephen D. Pickett,et al.  Diversity Profiling and Design Using 3D Pharmacophores: Pharmacophore-Derived Queries (PDQ) , 1996, J. Chem. Inf. Comput. Sci..

[18]  Irwin D. Kuntz,et al.  A fast and efficient method for 2D and 3D molecular shape description , 1992, J. Comput. Aided Mol. Des..

[19]  M. Murcko,et al.  Chemogenomic approaches to drug discovery. , 2001, Current opinion in chemical biology.

[20]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[21]  Jürgen Bajorath,et al.  Database Searching for Compounds with Similar Biological Activity Using Short Binary Bit String Representations of Molecules , 1999, J. Chem. Inf. Comput. Sci..

[22]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[23]  M F Engels,et al.  Smart screening: approaches to efficient HTS. , 2001, Current opinion in drug discovery & development.

[24]  E. Jacoby A Novel Chemogenomics Knowledge-Based Ligand Design Strategy—Application to G Protein-Coupled Receptors , 2001 .

[25]  P. Willett,et al.  A Comparison of Some Measures for the Determination of Inter‐Molecular Structural Similarity Measures of Inter‐Molecular Structural Similarity , 1986 .

[26]  R M Knegtel,et al.  Sequence annotation of nuclear receptor ligand-binding domains by automated homology modeling. , 2000, Protein engineering.

[27]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Chemical Structures: Analysis of the BIOSTER Database Using Two-Dimensional Fingerprints and Molecular Field Descriptors , 2000, J. Chem. Inf. Comput. Sci..

[28]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[29]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Chemical Structures. Alignment of Molecular Electrostatic Potential Fields with a Genetic Algorithm , 1996, J. Chem. Inf. Comput. Sci..

[30]  Andrew C. Good,et al.  Investigating the extension of pairwise distance pharmacophore measures to triplet-based descriptors , 1995, J. Comput. Aided Mol. Des..

[31]  Robert P. Sheridan,et al.  The Centroid Approximation for Mixtures: Calculating Similarity and Deriving Structure-Activity Relationships , 2000, J. Chem. Inf. Comput. Sci..