MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development

Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/.

[1]  Gokmen Zararsiz,et al.  Drug/nondrug classification using Support Vector Machines with various feature selection strategies , 2014, Comput. Methods Programs Biomed..

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[4]  Paul D Lyne,et al.  Structure-based virtual screening: an overview. , 2002, Drug discovery today.

[5]  Dariusz Plewczynski,et al.  Virtual high throughput screening using combined random forest and flexible docking. , 2009, Combinatorial chemistry & high throughput screening.

[6]  Anne Mai Wassermann,et al.  Searching for Target-Selective Compounds Using Different Combinations of Multiclass Support Vector Machine Ranking Methods, Kernel Functions, and Fingerprint Descriptors , 2009, J. Chem. Inf. Model..

[7]  Igor V. Pletnev,et al.  Drug Discovery Using Support Vector Machines. The Case Studies of Drug-likeness, Agrochemical-likeness, and Enzyme Inhibition Predictions , 2003, J. Chem. Inf. Comput. Sci..

[8]  David W. Miller,et al.  Results of a New Classification Algorithm Combining K Nearest Neighbors and Recursive Partitioning , 2001, J. Chem. Inf. Comput. Sci..

[9]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[10]  Shivani Agarwal,et al.  Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach , 2010, J. Chem. Inf. Model..

[11]  Naomie Salim,et al.  Ligand-Based Virtual Screening Using Bayesian Networks , 2010, J. Chem. Inf. Model..

[12]  Mike Tyers,et al.  MolClass: a web portal to interrogate diverse small molecule screen datasets with different computational models , 2012, Bioinform..

[13]  Xianghui Liu,et al.  SVM Model for Virtual Screening of Lck Inhibitors , 2009, J. Chem. Inf. Model..

[14]  Tao Jiang,et al.  ChemmineR: a compound mining framework for R , 2008, Bioinform..

[15]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[16]  Thomas M. Ehrman,et al.  Virtual Screening of Chinese Herbs with Random Forest , 2007, J. Chem. Inf. Model..

[17]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[18]  Klaus-Robert Müller,et al.  StructRank: A New Approach for Ligand-Based Virtual Screening , 2011, J. Chem. Inf. Model..

[19]  H. Kubinyi,et al.  A scoring scheme for discriminating between drugs and nondrugs. , 1998, Journal of medicinal chemistry.

[20]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[21]  J C Gertrudes,et al.  Machine learning techniques and drug design. , 2012, Current medicinal chemistry.

[22]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[23]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[24]  Céline Rouveirol,et al.  Machine Learning: ECML-98 , 1998, Lecture Notes in Computer Science.

[25]  Uko Maran,et al.  DrugLogit: Logistic Discrimination between Drugs and Nondrugs Including Disease-Specificity by Assigning Probabilities Based on Molecular Properties , 2012, J. Chem. Inf. Model..

[26]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[27]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[28]  Ajay,et al.  Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? , 1998, Journal of medicinal chemistry.

[29]  Jiansong Fang,et al.  Predictions of BuChE Inhibitors Using Support Vector Machine and Naive Bayesian Classification Techniques in Drug Discovery , 2013, J. Chem. Inf. Model..

[30]  Igor V. Filippov,et al.  Development and implementation of (Q)SAR modeling within the CHARMMing web‐user interface , 2015, J. Comput. Chem..

[31]  Lei Yang,et al.  Classification of Cytochrome P450 Inhibitors and Noninhibitors Using Combined Classifiers , 2011, J. Chem. Inf. Model..

[32]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[33]  Peter Filzmoser,et al.  An Object-Oriented Framework for Robust Multivariate Analysis , 2009 .

[34]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[35]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[36]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[37]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[38]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[39]  George Papadatos,et al.  Evaluation of machine-learning methods for ligand-based virtual screening , 2007, J. Comput. Aided Mol. Des..

[40]  Jennifer Keiser,et al.  Interactions of mefloquine with praziquantel in the Schistosoma mansoni mouse model and in vitro. , 2011, The Journal of antimicrobial chemotherapy.

[41]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[42]  T. Lumley,et al.  gplots: Various R Programming Tools for Plotting Data , 2015 .

[43]  Hongmao Sun A naive bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing. , 2005, Journal of medicinal chemistry.