QSAR models for predicting the bioactivity of Polo-like Kinase 1 inhibitors

Abstract As a member of serine/threonine kinases family, Polo-like kinase 1 (PLK1) plays a key role in regulating cell cycle progression, particularly mitosis, emerging as an important target for cancer therapy. It is necessary and urgent to develop highly predictive in silico models to predict the bioactivity of PLK1 inhibitors. In our work, 16 single classifier models and one consensus Kohonen's Self-organizing Map (SOM) model were constructed to discriminate the highly active PLK1 inhibitors from the poorly active ones on a dataset of 601 noncongeneric PLK1 inhibitors. For these 16 single classifier models, we used four machine learning methods - Support Vector Machine (SVM), Naive Bayes (NB), C4.5 Decision Tree (C4.5 DT) and Random Forest (RF), with the MCCs ranging from 0.609 to 0.864 and the accuracies ranging from 78.7% to 93.1% for the test set. Then the consensus SOM model was built based on four single classifier models to obtain a more reliable and robust model. It turned out our consensus model outperformed all the single classifier models with the MCC of 0.872 and the accuracy of 93.6% on the test set. In addition, we combined two dataset splitting methods (by random and SOM) and two feature selection methods to find the best combination of them. As a result, SVMAttributeEval combined with SOM splitting method achieved the best model performance. Additionally, 20 good ECFP_4 features and 20 bad ECFP_4 features were found, which will help chemists to discriminate highly active PLK1 inhibitors from poorly active ones.

[1]  Yangyang He,et al.  Consensus models for CDK5 inhibitors in silico and their application to inhibitor discovery , 2014, Molecular Diversity.

[2]  Anthony E. Klon,et al.  Machine learning algorithms for the prediction of hERG and CYP450 binding in drug development , 2010, Expert opinion on drug metabolism & toxicology.

[3]  Ing-Marie Olsson,et al.  D-optimal onion designs in statistical molecular design , 2004 .

[4]  Aixia Yan,et al.  Classification of Aurora kinase inhibitors by self-organizing map (SOM) and support vector machine (SVM). , 2013, European journal of medicinal chemistry.

[5]  Toki Saito,et al.  How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach , 2016, J. Biomed. Informatics.

[6]  Harinder Singh,et al.  QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest , 2015, Biology Direct.

[7]  R. Erikson,et al.  Plk1-targeted therapies in TP53- or RAS-mutated cancer. , 2014, Mutation research. Reviews in mutation research.

[8]  S. Kuang,et al.  Inhibition of Polo-like Kinase 1 (Plk1) Enhances the Antineoplastic Activity of Metformin in Prostate Cancer* , 2014, The Journal of Biological Chemistry.

[9]  Naresh Kandakatla,et al.  Theoretical studies on benzimidazole and imidazo[1,2-a]pyridine derivatives as Polo-like kinase 1 (Plk1) inhibitors: Pharmacophore modeling, atom-based 3D-QSAR and molecular docking approach , 2017 .

[10]  Sha Cao QSAR, molecular docking studies of thiophene and imidazopyridine derivatives as polo-like kinase 1 inhibitors , 2012 .

[11]  Weiping Zhang,et al.  Machine learning algorithms for mode-of-action classification in toxicity assessment , 2016, BioData Mining.

[12]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[13]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[14]  Sven Becker,et al.  Thoughts on the current assessment of Polo-like kinase inhibitor drug discovery , 2015, Expert opinion on drug discovery.

[15]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[16]  Yanli Wang,et al.  Pharmacophore and 3D-QSAR Characterization of 6-Arylquinazolin-4-amines as Cdc2-like Kinase 4 (Clk4) and Dual Specificity Tyrosine-phosphorylation-regulated Kinase 1A (Dyrk1A) Inhibitors , 2013, J. Chem. Inf. Model..

[17]  Aixia Yan,et al.  Self-Organizing Map (SOM) and Support Vector Machine (SVM) Models for the Prediction of Human Epidermal Growth Factor Receptor (EGFR/ ErbB-1) Inhibitors. , 2016, Combinatorial chemistry & high throughput screening.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  R. Liu,et al.  Classification of Aurora B kinase inhibitors using computational models. , 2014, Combinatorial chemistry & high throughput screening.

[21]  Pablo R Duchowicz,et al.  QSAR models for thiophene and imidazopyridine derivatives inhibitors of the Polo-Like Kinase 1. , 2014, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[22]  Salma Jamal,et al.  Cheminformatic models based on machine learning for pyruvate kinase inhibitors of Leishmania mexicana , 2013, BMC Bioinformatics.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  P. Schöffski,et al.  Discovery and development of the Polo-like kinase inhibitor volasertib in cancer therapy , 2014, Leukemia.

[25]  Andreas Bender,et al.  Melting Point Prediction Employing k-Nearest Neighbor Algorithms and Genetic Parameter Optimization , 2006, J. Chem. Inf. Model..

[26]  Jonathan D Hirst,et al.  Machine learning in virtual screening. , 2009, Combinatorial chemistry & high throughput screening.

[27]  C. McInnes,et al.  PLK1 as an oncology target: current status and future potential. , 2011, Drug discovery today.

[28]  William Stafford Noble,et al.  Support vector machine , 2013 .

[29]  Kyung S. Lee,et al.  Recent Advances and New Strategies in Targeting Plk1 for Anticancer Therapy. , 2015, Trends in pharmacological sciences.

[30]  Alexander Golbraikh,et al.  Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling? , 2012, J. Chem. Inf. Model..

[31]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[32]  C E Berkoff,et al.  Substructural analysis. A novel approach to the problem of drug design. , 1974, Journal of medicinal chemistry.

[33]  Roberto Kawakami Harrop Galvão,et al.  A method for calibration and validation subset partitioning. , 2005, Talanta.

[34]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .