When drug discovery meets web search: Learning to Rank for ligand-based virtual screening

AbstractBackgroundThe rapid increase in the emergence of novel chemical substances presents a substantial demands for more sophisticated computational methodologies for drug discovery. In this study, the idea of Learning to Rank in web search was presented in drug virtual screening, which has the following unique capabilities of 1). Applicable of identifying compounds on novel targets when there is not enough training data available for these targets, and 2). Integration of heterogeneous data when compound affinities are measured in different platforms.ResultsA standard pipeline was designed to carry out Learning to Rank in virtual screening. Six Learning to Rank algorithms were investigated based on two public datasets collected from Binding Database and the newly-published Community Structure-Activity Resource benchmark dataset. The results have demonstrated that Learning to rank is an efficient computational strategy for drug virtual screening, particularly due to its novel use in cross-target virtual screening and heterogeneous data integration.ConclusionsTo the best of our knowledge, we have introduced here the first application of Learning to Rank in virtual screening. The experiment workflow and algorithm assessment designed in this study will provide a standard protocol for other similar studies. All the datasets as well as the implementations of Learning to Rank algorithms are available at http://www.tongji.edu.cn/~qiliu/lor_vs.html. Graphical AbstractThe analogy between web search and ligand-based drug discovery

[1]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[2]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[3]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[4]  Ghanima Al-Sharrah,et al.  Ranking Using the Copeland Score: A Comparison with the Hasse Diagram , 2010, J. Chem. Inf. Model..

[5]  Klaus-Robert Müller,et al.  StructRank: A New Approach for Ligand-Based Virtual Screening , 2011, J. Chem. Inf. Model..

[6]  Cong Wang,et al.  A survey on learning to rank , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[7]  Chartchalerm Isarankura-Na-Ayudhya,et al.  A practical overview of quantitative structure-activity relationship , 2009 .

[8]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[9]  P. Labute A widely applicable set of descriptors. , 2000, Journal of molecular graphics & modelling.

[10]  S. Huang,et al.  Genomics, complexity and drug discovery: insights from Boolean network models of cellular regulation. , 2001, Pharmacogenomics.

[11]  George Karypis,et al.  Target Fishing for Chemical Compounds Using Target-Ligand Activity Data and Ranking Based Methods , 2009, J. Chem. Inf. Model..

[12]  Rainer Brüggemann,et al.  Improved Estimation of the Ranking Probabilities in Partial Orders Using Random Linear Extensions by Approximation of the Mutual Ranking Probability , 2003, J. Chem. Inf. Comput. Sci..

[13]  James Parker,et al.  on Knowledge and Data Engineering, , 1990 .

[14]  Anne Mai Wassermann,et al.  Searching for Target-Selective Compounds Using Different Combinations of Multiclass Support Vector Machine Ranking Methods, Kernel Functions, and Fingerprint Descriptors , 2009, J. Chem. Inf. Model..

[15]  Mark A. Murcko,et al.  Virtual screening : an overview , 1998 .

[16]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[17]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[18]  Shivani Agarwal,et al.  Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach , 2010, J. Chem. Inf. Model..

[19]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[20]  Xi Chen,et al.  Multi-target QSAR modelling in the analysis and design of HIV-HCV co-inhibitors: an in-silico study , 2011, BMC Bioinformatics.

[21]  Zhiwei Cao,et al.  Study on human GPCR-inhibitor interactions by proteochemometric modeling. , 2013, Gene.

[22]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[23]  Gisbert Schneider,et al.  Evaluation of Distance Metrics for Ligand‐Based Similarity Searching , 2004, Chembiochem : a European journal of chemical biology.

[24]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[25]  L. Jiang,et al.  PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[26]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[27]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[28]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  Evan Bolton,et al.  An overview of the PubChem BioAssay resource , 2009, Nucleic Acids Res..

[31]  Li Shao,et al.  Consensus Ranking Approach to Understanding the Underlying Mechanism With QSAR , 2010, J. Chem. Inf. Model..

[32]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[33]  Jun Gao,et al.  Integrated QSAR study for inhibitors of hedgehog signal pathway against multiple cell lines:a collaborative filtering method , 2012, BMC Bioinformatics.

[34]  Hang Li Learning to Rank , 2017, Encyclopedia of Machine Learning and Data Mining.

[35]  Peilin Jia,et al.  Genomewide pharmacogenomic study of metabolic side effects to antipsychotic drugs , 2010, Molecular Psychiatry.

[36]  Z. R. Li,et al.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[37]  Ruixin Zhu,et al.  Multi-target QSAR Study in the Analysis and Design of HIV-1 Inhibitors† , 2010 .

[38]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[39]  Evan Bolton,et al.  PubChem's BioAssay Database , 2011, Nucleic Acids Res..

[40]  Jean-Philippe Vert,et al.  Protein-ligand interaction prediction: an improved chemogenomics approach , 2008, Bioinform..

[41]  M S Kearns,et al.  ADV NEUR IN , 1999, NIPS 1999.

[42]  Qiang Yang,et al.  Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study , 2010, BMC Bioinformatics.

[43]  Jun Gao,et al.  Screening of selective histone deacetylase inhibitors by proteochemometric modeling , 2012, BMC Bioinformatics.

[44]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[45]  Shuicheng Yan,et al.  Learning to rank tags , 2010, CIVR '10.

[46]  J. Ménissier-de murcia,et al.  XRCC1 is phosphorylated by DNA-dependent protein kinase in response to DNA damage , 2006, Nucleic acids research.