An Ontology for Pharmaceutical Ligands and Its Application for in Silico Screening and Library Design

Annotation efforts in biosciences have focused in past years mainly on the annotation of genomic sequences. Only very limited effort has been put into annotation schemes for pharmaceutical ligands. Here we propose annotation schemes for the ligands of four major target classes, enzymes, G protein-coupled receptors (GPCRs), nuclear receptors (NRs), and ligand-gated ion channels (LGICs), and outline their usage for in silico screening and combinatorial library design. The proposed schemes cover ligand functionality and hierarchical levels of target classification. The classification schemes are based on those established by the EC, GPCRDB, NuclearDB, and LGICDB. The ligands of the MDL Drug Data Report (MDDR) database serve as a reference data set of known pharmacologically active compounds. All ligands were annotated according to the schemes when attribution was possible based on the activity classification provided by the reference database. The purpose of the ligand-target classification schemes is to allow annotation-based searching of the ligand database. In addition, the biological sequence information of the target is directly linkable to the ligand, hereby allowing sequence similarity-based identification of ligands of next homologous receptors. Ligands of specified levels can easily be retrieved to serve as comprehensive reference sets for cheminformatics-based similarity searches and for design of target class focused compound libraries. Retrospective in silico screening experiments within the MDDR01.1 database, searching for structures binding to dopamine D2, all dopamine receptors and all amine-binding class A GPCRs using known dopamine D2 binding compounds as a reference set, have shown that such reference sets are in particular useful for the identification of ligands binding to receptors closely related to the reference system. The potential for ligand identification drops with increasing phylogenetic distance. The analysis of the focus of a tertiary amine based combinatorial library compared to known amine binding class A GPCRs, peptide binding class A GPCRs, and LGIC ligands constitutes a second application scenario which illustrates how the focus of a combinatorial library can be treated quantitatively. The provided annotation schemes, which bridge chem- and bioinformatics by linking ligands to sequences, are expected to be of key utility for further systematic chemogenomics exploration of previously well explored target families.

[1]  E. Jacoby A Novel Chemogenomics Knowledge-Based Ligand Design Strategy—Application to G Protein-Coupled Receptors , 2001 .

[2]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[3]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Chemical Structures: Analysis of the BIOSTER Database Using Two-Dimensional Fingerprints and Molecular Field Descriptors , 2000, J. Chem. Inf. Comput. Sci..

[4]  Carole A. Goble,et al.  Ontology-based Knowledge Representation for Bioinformatics , 2000, Briefings Bioinform..

[5]  Nicolas Le Novère,et al.  LGICdb: the ligand-gated ion channel database , 2001, Nucleic Acids Res..

[6]  D. Bergsma,et al.  Orphan G protein-coupled receptors: a neglected opportunity for pioneer drug discovery. , 1997, Trends in pharmacological sciences.

[7]  Susumu Goto,et al.  LIGAND: chemical database of enzyme reactions , 2000, Nucleic Acids Res..

[8]  John Bradshaw,et al.  Identification of Biological Activity Profiles Using Substructural Analysis and Genetic Algorithms , 1998, J. Chem. Inf. Comput. Sci..

[9]  Jürgen Bajorath,et al.  Molecular Descriptors for Effective Classification of Biologically Active Compounds Based on Principal Component Analysis Identified by a Genetic Algorithm , 2000, J. Chem. Inf. Comput. Sci..

[10]  Takaaki Nishioka,et al.  Finding lead structures from amino acid sequence similarities of target proteins , 1989 .

[11]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[12]  Nicolas Le Novère,et al.  The Ligand Gated Ion Channel Database , 1999, Nucleic Acids Res..

[13]  Gert Vriend,et al.  Collecting and harvesting biological data: the GPCRDB and NucleaRDB information systems , 2001, Nucleic Acids Res..

[14]  M. Murcko,et al.  Chemogenomic approaches to drug discovery. , 2001, Current opinion in chemical biology.

[15]  S. Frye Structure-activity relationship homology (SARAH): a conceptual framework for drug discovery in the genomic era. , 1999, Chemistry & biology.