Differential Compound Prioritization via Bi-Directional Selectivity Push with Power

Effective in silico compound prioritization is critical to identify promising candidates in the early stages of drug discovery. Current methods typically focus on compound ranking based on one single property, for example, activity, against a single target. However, compound selectivity is also a key property that should be deliberated simultaneously so as to reduce the likelihood of undesired side effects of future drugs. In this paper, we present a novel machine learning based differential compound prioritization method dCPPP. This dCPPP method learns compound prioritization models that rank active compounds well, and meanwhile, preferably rank selective compounds higher via a bi-directional push strategy. The bidirectional push is enhanced with push powers that are determined by ranking difference of selective compounds over multiple bioassays. Our experiments demonstrate that the dCPPP achieves an overall 19.221% improvement on prioritizing selective compounds over baseline models.

[1]  J. Bajorath,et al.  Methods for Computer‐Aided Chemical Biology. Part 3: Analysis of Structure–Selectivity Relationships through Single‐ or Dual‐Step Selectivity Searching and Bayesian Classification , 2008, Chemical biology & drug design.

[2]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[3]  J. Bajorath,et al.  Compound promiscuity: what can we learn from current data? , 2013, Drug discovery today.

[4]  Shivani Agarwal,et al.  The Infinite Push: A New Support Vector Ranking Algorithm that Directly Optimizes Accuracy at the Absolute Top of the List , 2011, SDM.

[5]  Anne Mai Wassermann,et al.  Application of support vector machine-based ranking strategies to search for target-selective compounds. , 2011, Methods in molecular biology.

[6]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[7]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[8]  Nathanael Weill,et al.  Development and Validation of a Novel Protein-Ligand Fingerprint To Mine Chemogenomic Space: Application to G Protein-Coupled Receptors and Their Ligands , 2009, J. Chem. Inf. Model..

[9]  Samuel Kaski,et al.  Kernelized Bayesian Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[11]  George Karypis,et al.  Multi-Assay-Based Structure-Activity Relationship Models: Improving Structure-Activity Relationship Models by Incorporating Activity Information from Related Targets , 2009, J. Chem. Inf. Model..

[12]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[13]  Anne Mai Wassermann,et al.  Searching for Target-Selective Compounds Using Different Combinations of Multiclass Support Vector Machine Ranking Methods, Kernel Functions, and Fingerprint Descriptors , 2009, J. Chem. Inf. Model..

[14]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[15]  Shivani Agarwal,et al.  Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach , 2010, J. Chem. Inf. Model..

[16]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[17]  R. W. Hansen,et al.  The price of innovation: new estimates of drug development costs. , 2003, Journal of health economics.

[18]  Cynthia Rudin,et al.  The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List , 2009, J. Mach. Learn. Res..

[19]  Xia Ning,et al.  Multi-Assay-Based Compound Prioritization via Assistance Utilization: A Machine Learning Framework , 2017, J. Chem. Inf. Model..

[20]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[21]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[22]  John D. Lafferty,et al.  Cranking: Combining Rankings Using Conditional Probability Models on Permutations , 2002, ICML.

[23]  E. Ashley Towards precision medicine , 2016, Nature Reviews Genetics.

[24]  George Karypis,et al.  Improved machine learning models for predicting selective compounds , 2011, BCB '11.

[25]  Jürgen Bajorath,et al.  Chemoinformatics and Computational Chemical Biology , 2011, Methods in Molecular Biology.

[26]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[27]  Hans Briem,et al.  Flexsim-X: A Method for the Detection of Molecules with Similar Biological Activity , 2000, J. Chem. Inf. Comput. Sci..

[28]  Andreas Bender,et al.  "Bayes Affinity Fingerprints" Improve Retrieval Rates in Virtual Screening and Define Orthogonal Bioactivity Space: When Are Multitarget Drugs a Feasible Concept? , 2006, J. Chem. Inf. Model..

[29]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[30]  Jürgen Bajorath,et al.  From Structure–Activity to Structure–Selectivity Relationships: Quantitative Assessment, Selectivity Cliffs, and Key Compounds , 2009, ChemMedChem.

[31]  Steven L. Dixon,et al.  Bioactive Diversity and Screening Library Selection via Affinity Fingerprinting , 1998, J. Chem. Inf. Comput. Sci..

[32]  Stephen P. Boyd,et al.  Accuracy at the Top , 2012, NIPS.

[33]  Mindy I. Davis,et al.  A quantitative analysis of kinase inhibitor selectivity , 2008, Nature Biotechnology.

[34]  Hanna Geppert,et al.  Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation , 2010, J. Chem. Inf. Model..

[35]  Andreas Bender,et al.  Ligand-Target Prediction Using Winnow and Naive Bayesian Algorithms and the Implications of Overall Performance Statistics , 2008, J. Chem. Inf. Model..

[36]  Jürgen Bajorath,et al.  Methods for Computer‐aided Chemical Biology. Part 2: Evaluation of Compound Selectivity Using 2D Molecular Fingerprints , 2007, Chemical biology & drug design.

[37]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[38]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[39]  Jürgen Bajorath,et al.  Exploring Compound Promiscuity Patterns and Multi-Target Activity Spaces , 2014, Computational and structural biotechnology journal.

[40]  Anders Berglund,et al.  Hierarchical PLS Modeling for Predicting the Binding of a Comprehensive Set of Structurally Diverse Protein-Ligand Complexes , 2006, J. Chem. Inf. Model..