Binding Activity Prediction of Cyclin-Dependent Inhibitors

The Cyclin-Dependent Kinases (CDKs) are the core components coordinating eukaryotic cell division cycle. Generally the crystal structure of CDKs provides information on possible molecular mechanisms of ligand binding. However, reliable and robust estimation of ligand binding activity has been a challenging task in drug design. In this regard, various machine learning techniques, such as Support Vector Machine, Naive Bayesian classifier, Decision Tree, and K-Nearest Neighbor classifier, have been used. The performance of these heterogeneous classification techniques depends on proper selection of features from the data set. This fact motivated us to propose an integrated classification technique using Genetic Algorithm (GA), Rotational Feature Selection (RFS) scheme, and Ensemble of Machine Learning methods, named as the Genetic Algorithm integrated Rotational Ensemble based classification technique, for the prediction of ligand binding activity of CDKs. This technique can automatically find the important features and the ensemble size. For this purpose, GA encodes the features and ensemble size in a chromosome as a binary string. Such encoded features are then used to create diverse sets of training points using RFS in order to train the machine learning method multiple times. The RFS scheme works on Principal Component Analysis (PCA) to preserve the variability information of the rotational nonoverlapping subsets of original data. Thereafter, the testing points are fed to the different instances of trained machine learning method in order to produce the ensemble result. Here accuracy is computed as a final result after 10-fold cross validation, which also used as an objective function for GA to maximize. The effectiveness of the proposed classification technique has been demonstrated quantitatively and visually in comparison with different machine learning methods for 16 ligand binding CDK docking and rescoring data sets. In addition, the best possible features have been reported for CDK docking and rescoring data sets separately. Finally, the Friedman test has been conducted to judge the statistical significance of the results produced by the proposed technique. The results indicate that the integrated classification technique has high relevance in predicting of protein-ligand binding activity.

[1]  Hongmao Sun,et al.  Selective small-molecule inhibitor reveals critical mitotic functions of human CDK1. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[3]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[4]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[5]  Jonathan W. Essex,et al.  A review of protein-small molecule docking methods , 2002, J. Comput. Aided Mol. Des..

[6]  M. Shah,et al.  Targeting the cell cycle: a new approach to cancer therapy. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[7]  Luhua Lai,et al.  Further development and validation of empirical scoring functions for structure-based binding affinity prediction , 2002, J. Comput. Aided Mol. Des..

[8]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[9]  I. Muegge A knowledge-based scoring function for protein-ligand interactions: Probing the reference state , 2000 .

[10]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[11]  Ting Wang,et al.  3D Protein structure prediction with genetic tabu search algorithm , 2009, 2009 Second International Symposium on Knowledge Acquisition and Modeling.

[12]  Aniko Simon,et al.  eHiTS: an innovative approach to the docking and scoring function problems. , 2006, Current protein & peptide science.

[13]  Vellarkad N Viswanadhan,et al.  Could MM-GBSA be accurate enough for calculation of absolute protein/ligand binding free energies? , 2013, Journal of molecular graphics & modelling.

[14]  Xiaoqin Zou,et al.  Scoring functions and their evaluation methods for protein-ligand docking: recent advances and future directions. , 2010, Physical chemistry chemical physics : PCCP.

[15]  Martha S. Head,et al.  Validation Studies of the Site-Directed Docking Program LibDock , 2007, J. Chem. Inf. Model..

[16]  Debotosh Bhattacharjee,et al.  RotaSVM: A New Ensemble Classifier , 2013 .

[17]  David Lagorce,et al.  MS-DOCK: Accurate multiple conformation generator and rigid docking protocol for multi-step virtual ligand screening , 2008, BMC Bioinformatics.

[18]  Dariusz Plewczynski,et al.  VoteDock: Consensus docking method for prediction of protein–ligand interactions , 2011, J. Comput. Chem..

[19]  Shiliang Sun,et al.  Ensembles of Feature Subspaces for Object Detection , 2009, ISNN.

[20]  Charles L. Brooks,et al.  Detailed analysis of grid‐based molecular docking: A case study of CDOCKER—A CHARMm‐based MD docking algorithm , 2003, J. Comput. Chem..

[21]  R. Friesner,et al.  Generalized Born Model Based on a Surface Integral Formulation , 1998 .

[22]  Jonathan B. Chaires,et al.  Molecular Docking of Intercalators and Groove-Binders to Nucleic Acids Using Autodock and Surflex , 2008, J. Chem. Inf. Model..

[23]  Zhilong Xiu,et al.  Rescoring ligand docking poses. , 2010, Current opinion in drug discovery & development.

[24]  Thanyada Rungrotmongkol,et al.  Molecular Dynamic Behavior and Binding Affinity of Flavonoid Analogues to the Cyclin Dependent Kinase 6/cyclin D Complex , 2012, J. Chem. Inf. Model..

[25]  Cesare Alippi,et al.  Genetic-algorithm programming environments , 1994, Computer.

[26]  Niu Huang,et al.  Physics-Based Scoring of Protein-Ligand Complexes: Enrichment of Known Inhibitors in Large-Scale Virtual Screening , 2006, J. Chem. Inf. Model..

[27]  I. Kuntz,et al.  Docking flexible ligands to macromolecular receptors by molecular shape. , 1986, Journal of medicinal chemistry.

[28]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[29]  Ajay N. Jain Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. , 2003, Journal of medicinal chemistry.

[30]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[31]  D O Morgan,et al.  Cyclin-dependent kinases: engines, clocks, and microprocessors. , 1997, Annual review of cell and developmental biology.

[32]  Dariusz Plewczynski,et al.  Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database , 2011, J. Comput. Chem..

[33]  M. Barbacid,et al.  Cell cycle kinases in cancer. , 2007, Current opinion in genetics & development.

[34]  M. El-Deab,et al.  Platinum nanoparticles–manganese oxide nanorods as novel binary catalysts for formic acid oxidation , 2012 .