Pharmacophore features for machine learning in pharmaceutical virtual screening

Abstract Methods of three-dimensional molecular alignment generally treat all pharmacophore features equally when superimposing. However, some pharmacophore features can be more important in a specific system. In this work, we derived the overlap volume of pharmacophore features from a molecular alignment approach as new features of molecules to build machine learning models. Features can be assigned weights to indicate their importance. With validation on DUD-E collection, models based on pharmacophore features represented by the overlap volume yielded significant performances with median AUC of approximately 0.98 and recall rate of almost 0.8. Graphic abstract

[1]  P. Hawkins,et al.  Comparison of shape-matching and docking as virtual screening tools. , 2007, Journal of medicinal chemistry.

[2]  J. A. Grant,et al.  A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape , 1996, J. Comput. Chem..

[3]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[4]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[5]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[8]  Xiaofeng Liu,et al.  SHAFTS: A Hybrid Approach for 3D Molecular Similarity Calculation. 1. Method and Assessment of Virtual Screening , 2011, J. Chem. Inf. Model..

[9]  Kathrin Heikamp,et al.  Support vector machines for drug discovery , 2014, Expert opinion on drug discovery.

[10]  Kim-Anh Lê Cao,et al.  ofw: An R Package to Select Continuous Variables for Multiclass Classification with a Stochastic Wrapper Method , 2008 .

[11]  Jonathan D Hirst,et al.  Machine learning in virtual screening. , 2009, Combinatorial chemistry & high throughput screening.

[12]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  Johannes Kirchmair,et al.  CAESAR: A New Conformer Generation Algorithm Based on Recursive Buildup and Local Rotational Symmetry Consideration. , 2007 .

[15]  Erhard Cramer,et al.  Association of progressively Type-II censored order statistics , 2010 .

[16]  Minghao Zheng,et al.  Enhancing Molecular Shape Comparison by Weighted Gaussian Functions , 2013, J. Chem. Inf. Model..

[17]  Vijay S. Pande,et al.  ROCS-derived features for virtual screening , 2016, Journal of Computer-Aided Molecular Design.

[18]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[19]  Lazaros Mavridis,et al.  Toward High Throughput 3D Virtual Screening Using Spherical Harmonic Surface Representations , 2007, J. Chem. Inf. Model..

[20]  Mark S. Johnson,et al.  ShaEP: Molecular Overlay Based on Shape and Electrostatic Potential , 2009, J. Chem. Inf. Model..

[21]  Ajay N. Jain,et al.  Molecular Shape and Medicinal Chemistry: A Perspective , 2010, Journal of medicinal chemistry.

[22]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[23]  A. Good,et al.  3-D pharmacophores in drug discovery. , 2001, Current pharmaceutical design.

[24]  P. Willett,et al.  PHARMACOPHORE PERCEPTION , DEVELOPMENT , AND USE IN DRUG DESIGN , 2011 .

[25]  Jürgen Bajorath,et al.  Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. , 2007, Drug discovery today.

[26]  W. Graham Richards,et al.  Ultrafast shape recognition to search compound databases for similar molecular shapes , 2007, J. Comput. Chem..