Ligand-Based Target Prediction with Signature Fingerprints

When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.

[1]  Robert D. Carr,et al.  The Signature Molecular Descriptor. 4. Canonizing Molecules Using Extended Valence Sequences , 2004, J. Chem. Inf. Model..

[2]  D J Rogers,et al.  A Computer Program for Classifying Plants. , 1960, Science.

[3]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[4]  I. Kola,et al.  Can the pharmaceutical industry reduce attrition rates? , 2004, Nature Reviews Drug Discovery.

[5]  Ulf Norinder,et al.  QSAR investigation of NaV1.7 active compounds using the SVM/Signature approach and the Bioclipse Modeling platform. , 2013, Bioorganic & medicinal chemistry letters.

[6]  Stefan Schmitt,et al.  Do structurally similar ligands bind in a similar fashion? , 2006, Journal of medicinal chemistry.

[7]  Anthony Nicholls,et al.  What do we know and when do we know it? , 2008, J. Comput. Aided Mol. Des..

[8]  Sereina Riniker,et al.  Open-source platform to benchmark fingerprints for ligand-based virtual screening , 2013, Journal of Cheminformatics.

[9]  Donald P. Visco,et al.  Computer-aided molecular design using the Signature molecular descriptor: Application to solvent selection , 2010, Comput. Chem. Eng..

[10]  Thierry Kogej,et al.  Comparison of Molecular Fingerprint Methods on the Basis of Biological Profile Data , 2009, J. Chem. Inf. Model..

[11]  Michael J. Keiser,et al.  Large Scale Prediction and Testing of Drug Activity on Side-Effect Targets , 2012, Nature.

[12]  Scott Boyer,et al.  Interpretation of Nonlinear QSAR Models Applied to Ames Mutagenicity Data , 2009, J. Chem. Inf. Model..

[13]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[14]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[15]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[16]  Robert D. Clark,et al.  Managing bias in ROC curves , 2008, J. Comput. Aided Mol. Des..

[17]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR Studies , 2003, J. Chem. Inf. Comput. Sci..

[18]  A. Vulpetti,et al.  Comparability of Mixed IC50 Data – A Statistical Analysis , 2013, PloS one.

[19]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[20]  J. Hintze,et al.  Violin plots : A box plot-density trace synergism , 1998 .

[21]  J. Ware The limitations of risk factors as prognostic tools. , 2006, The New England journal of medicine.

[22]  Kathleen F. Kerr,et al.  Net reclassification indices for evaluating risk prediction instruments: a critical review. , 2014, Epidemiology.

[23]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[24]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[25]  Ola Spjuth,et al.  Integrated Decision Support for Assessing Chemical Liabilities , 2011, J. Chem. Inf. Model..

[26]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[27]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[28]  Woody Sherman,et al.  Large-Scale Systematic Analysis of 2D Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments , 2010, J. Chem. Inf. Model..

[29]  Kathrin Heikamp,et al.  Large-Scale Similarity Search Profiling of ChEMBL Compound Data Sets , 2011, J. Chem. Inf. Model..

[30]  Scott Boyer,et al.  Ligand-Based Approach to In Silico Pharmacology: Nuclear Receptor Profiling , 2006, J. Chem. Inf. Model..

[31]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[32]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 2. Enumerating Molecules from Their Extended Valence Sequences , 2003, J. Chem. Inf. Comput. Sci..

[33]  Kathrin Heikamp,et al.  Comparison of Confirmed Inactive and Randomly Selected Compounds as Negative Training Examples in Support Vector Machine-Based Virtual Screening , 2013, J. Chem. Inf. Model..

[34]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[35]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.