Target Fishing for Chemical Compounds Using Target-Ligand Activity Data and Ranking Based Methods

In recent years, the development of computational techniques that identify all the likely targets for a given chemical compound, also termed as the problem of Target Fishing, has been an active area of research. Identification of likely targets of a chemical compound in the early stages of drug discovery helps to understand issues such as selectivity, off-target pharmacology, and toxicity. In this paper, we present a set of techniques whose goal is to rank or prioritize targets in the context of a given chemical compound so that most targets against which this compound may show activity appear higher in the ranked list. These methods are based on our extensions to the SVM and ranking perceptron algorithms for this problem. Our extensive experimental study shows that the methods developed in this work outperform previous approaches 2% to 60% under different evaluation criterions.

[1]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[2]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[3]  G. Terstappen,et al.  Target deconvolution strategies in drug discovery , 2007, Nature Reviews Drug Discovery.

[4]  B. Roth,et al.  The Multiplicity of Serotonin Receptors: Uselessly Diverse Molecules or an Embarrassment of Riches? , 2000 .

[5]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[6]  P. Willett,et al.  Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. , 2004, Organic & biomolecular chemistry.

[7]  Tudor I. Oprea,et al.  WOMBAT: World of Molecular Bioactivity , 2005 .

[8]  George Karypis,et al.  Indirect Similarity Based Methods for Effective Scaffold-Hopping in Chemical Compounds , 2008, J. Chem. Inf. Model..

[9]  Z. Deng,et al.  Bridging chemical and biological space: "target fishing" using 2D and 3D molecular descriptors. , 2006, Journal of medicinal chemistry.

[10]  Gisbert Schneider,et al.  High‐Throughput Screening and Virtual Screening: Entry Points to Drug Discovery , 2000 .

[11]  F. Sams-Dodd Target-based drug discovery: is something wrong? , 2005, Drug discovery today.

[12]  T. Niwa Prediction of biological targets using probabilistic neural networks and atom-type descriptors. , 2004, Journal of medicinal chemistry.

[13]  Timothy J Mitchison,et al.  Small molecule screening by imaging. , 2006, Current opinion in chemical biology.

[14]  A. Bender,et al.  Modeling Promiscuity Based on in vitro Safety Pharmacology Profiling Data , 2007, ChemMedChem.

[15]  C. Hart,et al.  Finding the target after screening the phenotype. , 2005, Drug discovery today.

[16]  Peter Willett,et al.  Descriptor‐Based Similarity Measures for Screening Chemical Databases , 2000 .

[17]  David A. Cosgrove,et al.  Lead Hopping Using SVM and 3D Pharmacophore Fingerprints. , 2005 .

[18]  J F Osborn,et al.  Significance tests , 1989, British Dental Journal.

[19]  M. Omizo,et al.  Modeling , 1983, Encyclopedic Dictionary of Archaeology.

[20]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[21]  J. Jenkins,et al.  Prediction of Biological Targets for Compounds Using Multiple‐Category Bayesian Models Trained on Chemogenomics Databases. , 2006 .

[22]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[23]  A. Bender,et al.  In silico target fishing: Predicting biological targets from chemical structure , 2006 .

[24]  Thomas Hofmann,et al.  Unifying collaborative and content-based filtering , 2004, ICML.

[25]  Thomas Gärtner,et al.  Support-Vector-Machine-Based Ranking Significantly Improves the Effectiveness of Similarity Searching Using 2D Fingerprints and Multiple Reference Compounds , 2008, J. Chem. Inf. Model..

[26]  Meir Glick,et al.  Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers , 2006, J. Chem. Inf. Model..

[27]  Anne Mai Wassermann,et al.  Searching for Target-Selective Compounds Using Different Combinations of Multiclass Support Vector Machine Ranking Methods, Kernel Functions, and Fingerprint Descriptors , 2009, J. Chem. Inf. Model..

[28]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[29]  George Karypis,et al.  Better Kernels and Coding Schemes Lead to Improvements in SVM-Based Secondary Structure Prediction , 2005 .

[30]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[31]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[32]  Jason Weston,et al.  Multi-class protein fold recognition using adaptive codes , 2005, ICML.

[33]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[34]  Yoshimasa Takahashi,et al.  Predictive Activity Profiling of Drugs by Topological‐Fragment‐Spectra‐Based Support Vector Machines. , 2008 .

[35]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[36]  Koby Crammer,et al.  A new family of online algorithms for category ranking , 2002, SIGIR '02.

[37]  Y.Z. Chen,et al.  Ligand–protein inverse docking and its potential use in the computer search of protein targets of a small molecule , 2001, Proteins.

[38]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[39]  George Karypis,et al.  Building multiclass classifiers for remote homology detection and fold recognition , 2006, BMC Bioinformatics.

[40]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[41]  R. M. Muir,et al.  Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients , 1962, Nature.

[42]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[43]  Y. Z. Chen,et al.  Prediction of potential toxicity and side effect protein targets of a small molecule by a ligand-protein inverse docking approach. , 2001, Journal of molecular graphics & modelling.

[44]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[45]  Peter Willett,et al.  Enhancing the Effectiveness of Virtual Screening by Fusing Nearest Neighbor Lists: A Comparison of Similarity Coefficients , 2004, J. Chem. Inf. Model..