StructRank: A New Approach for Ligand-Based Virtual Screening

Screening large libraries of chemical compounds against a biological target, typically a receptor or an enzyme, is a crucial step in the process of drug discovery. Virtual screening (VS) can be seen as a ranking problem which prefers as many actives as possible at the top of the ranking. As a standard, current Quantitative Structure-Activity Relationship (QSAR) models apply regression methods to predict the level of activity for each molecule and then sort them to establish the ranking. In this paper, we propose a top-k ranking algorithm (StructRank) based on Support Vector Machines to solve the early recognition problem directly. Empirically, we show that our ranking approach outperforms not only regression methods but another ranking approach recently proposed for QSAR ranking, RankSVM, in terms of actives found.

[1]  Klaus-Robert Müller,et al.  From Machine Learning to Natural Product Derivatives that Selectively Activate Transcription Factor PPARγ , 2010, ChemMedChem.

[2]  Klaus-Robert Müller,et al.  Accurate Solubility Prediction with Error Bars for Electrolytes: A Machine Learning Approach , 2007, J. Chem. Inf. Model..

[3]  R. Erikson,et al.  Expression of a mitogen-responsive gene encoding prostaglandin synthase is regulated by mRNA splicing. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[4]  W. K. Sonnenburg,et al.  Concentrations of prostaglandin endoperoxide synthase and prostaglandin I2 synthase in the endothelium and smooth muscle of bovine aorta. , 1983, The Journal of clinical investigation.

[5]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[6]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[7]  Tomasz Arodz,et al.  Computational methods in developing quantitative structure-activity relationships (QSAR): a review. , 2006, Combinatorial chemistry & high throughput screening.

[8]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[9]  Thomas Lengauer,et al.  Novel technologies for virtual screening. , 2004, Drug discovery today.

[10]  Mark A. Murcko,et al.  Virtual screening : an overview , 1998 .

[11]  Shivani Agarwal,et al.  Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach , 2010, J. Chem. Inf. Model..

[12]  Klaus-Robert Müller,et al.  Machine learning models for lipophilicity and their domain of applicability. , 2007, Molecular pharmaceutics.

[13]  Anthony Nicholls,et al.  What do we know and when do we know it? , 2008, J. Comput. Aided Mol. Des..

[14]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[15]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[16]  Ulf Brefeld,et al.  Semi-supervised learning for structured output variables , 2006, ICML.

[17]  Tim D. J. Perkins,et al.  Large-scale virtual screening for discovering leads in the postgenomic era , 2001, IBM Syst. J..

[18]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[19]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[20]  P. Molinoff,et al.  Basic Neurochemistry: Molecular, Cellular and Medical Aspects , 1989 .

[21]  Haifeng Chen,et al.  Comparative Study of QSAR/QSPR Correlations Using Support Vector Machines, Radial Basis Function Neural Networks, and Multiple Linear Regression , 2004, J. Chem. Inf. Model..

[22]  Quoc V. Le Optimization of Ranking Measures , 2007 .

[23]  Jeffrey M Drazen,et al.  COX-2 inhibitors--a lesson in unexpected problems. , 2005, The New England journal of medicine.

[24]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[25]  P. Charifson,et al.  Improved scoring of ligand-protein interactions using OWFEG free energy grids. , 2001, Journal of medicinal chemistry.

[26]  Paul D Lyne,et al.  Structure-based virtual screening: an overview. , 2002, Drug discovery today.

[27]  J. Bertino,et al.  Karnofsky memorial lecture. Ode to methotrexate. , 1993, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[28]  Hengzhi Liu,et al.  QSAR Study of Ethyl 2‐[(3‐Methyl‐2,5‐dioxo(3‐pyrrolinyl))amino] ‐4‐(trifluoromethyl)pyrimidine‐5‐carboxylate: An Inhibitor of AP‐1 and NF‐ϰB Mediated Gene Expression Based on Support Vector Machines. , 2003 .

[29]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[31]  Thorsten Joachims,et al.  Support Vector Training of Protein Alignment Models , 2007, RECOMB.

[32]  Jeffrey J. Sutherland,et al.  Spline-Fitting with a Genetic Algorithm: A Method for Developing Classification Structure-Activity Relationships , 2003, J. Chem. Inf. Comput. Sci..

[33]  Jonathan D Hirst,et al.  Machine learning in virtual screening. , 2009, Combinatorial chemistry & high throughput screening.

[34]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[35]  Anne Mai Wassermann,et al.  Searching for Target-Selective Compounds Using Different Combinations of Multiclass Support Vector Machine Ranking Methods, Kernel Functions, and Fingerprint Descriptors , 2009, J. Chem. Inf. Model..

[36]  Jürgen Bajorath,et al.  New methodologies for ligand-based virtual screening. , 2005, Current pharmaceutical design.

[37]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[38]  Johann Gasteiger,et al.  Multivariate structure‐activity relationships between data from a battery of biological tests and an ensemble of structure descriptors: The PLS method , 1984 .

[39]  Yuan-Ping Pang,et al.  Successful virtual screening of a chemical database for farnesyltransferase inhibitor leads. , 2000, Journal of medicinal chemistry.

[40]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[41]  Jan M. Kriegl,et al.  Prediction of Human Cytochrome P450 Inhibition Using Support Vector Machines , 2005 .

[42]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[43]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[44]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[45]  Robert P. Sheridan,et al.  Protocols for Bridging the Peptide to Nonpeptide Gap in Topological Similarity Searches , 2001, J. Chem. Inf. Comput. Sci..

[46]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[47]  O. Chapelle Large margin optimization of ranking measures , 2007 .

[48]  E. Fluder,et al.  Protocols for Bridging the Peptide to Nonpeptide Gap in Topological Similarity Searches. , 2001 .