Extension of pQSAR: Ensemble Model Generated by Random Forest and Partial Least Squares Regressions

Quantitative structure-activity relationship (QSAR) regression models are mathematical ones which relate the structural properties of chemicals to the potencies of the biological activities of the chemicals. In QSAR models, the physical and chemical information of the molecules is encoded into quantitative numbers called descriptors. Recently, experimental test results (profiles) have been used as descriptors of chemicals. Profile QSAR 2.0 (pQSAR) model suggested by Martin et al., is a multitask, two step machine learning prediction method with a combination of random forest regressions (RFRs) and partial least squares regression (PLSR). In pQSAR model, one fills the profile table’s missing values with RFRs and then builds PLSR using the profile predictions. Note that in the second step of the pQSAR method, PLSR’s predictor variables are profiles; so activity values, and the response variables are also activity values. Thus we can use the PLSRs to update the profile table and then repeat the second step. In this work, we propose an extended model of pQSAR generated by RFRs and PLSRs. Experiment of updating the given full initially predicted profile table by two kinds of prediction models, RFRs and PLSRs, has been conducted iteratively for the PKIS and ChEMBL data sets. Even though prediction performance of individual combination of RFRs and PLSRs varies, the average of the all possible predicted profile tables for given iteration shows better performance. This ensemble model has better prediction performance in sense of Pearson’s $R^{2}$ compared to that of the pQSAR model.

[1]  Eric J. Martin,et al.  Profile-QSAR 2.0: Kinase Virtual Screening Accuracy Comparable to Four-Concentration IC50s for Realistically Novel Compounds , 2017, J. Chem. Inf. Model..

[2]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[3]  T. Willson,et al.  Seeding Collaborations to Advance Kinase Science with the GSK Published Kinase Inhibitor Set (PKIS) , 2014, Current topics in medicinal chemistry.

[4]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[5]  Mark J. van der Laan,et al.  The relative performance of ensemble methods with deep convolutional neural networks for image classification , 2017, Journal of applied statistics.

[6]  Thomas Hartung,et al.  Nonanimal Models for Acute Toxicity Evaluations: Applying Data-Driven Profiling and Read-Across , 2019, Environmental health perspectives.

[7]  Ivan Rusyn,et al.  The Use of Cell Viability Assay Data Improves the Prediction Accuracy of Conventional Quantitative Structure Activity Relationship Models of Animal Carcinogenicity , 2007 .

[8]  Xin Liu,et al.  All-Assay-Max2 pQSAR: Activity Predictions as Accurate as Four-Concentration IC50s for 8558 Novartis Assays , 2019, J. Chem. Inf. Model..

[9]  Daniela Trisciuzzi,et al.  A New Approach for Drug Target and Bioactivity Prediction: The Multifingerprint Similarity Search Algorithm (MuSSeL) , 2018, J. Chem. Inf. Model..

[10]  C. Hansch,et al.  A NEW SUBSTITUENT CONSTANT, PI, DERIVED FROM PARTITION COEFFICIENTS , 1964 .

[11]  A. Leo,et al.  Substituent constants for correlation analysis in chemistry and biology , 1979 .

[12]  Maik Moeller,et al.  An Introduction To Chemoinformatics , 2016 .

[13]  Sungroh Yoon,et al.  Comprehensive ensemble in QSAR prediction for drug discovery , 2019, BMC Bioinformatics.

[14]  Alexander Golbraikh,et al.  Predictive QSAR modeling: Methods and applications in drug discovery and chemical risk assessment , 2012 .

[15]  Min Wu,et al.  Computational Prediction of Drug-Target Interactions via Ensemble Learning. , 2018, Methods in molecular biology.

[16]  Jianyu Long,et al.  Evolving Deep Echo State Networks for Intelligent Fault Diagnosis , 2020, IEEE Transactions on Industrial Informatics.

[17]  David M. Rocke,et al.  Predicting ligand binding to proteins by affinity fingerprinting. , 1995, Chemistry & biology.

[18]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[19]  F. Burden,et al.  Robust QSAR models using Bayesian regularized neural networks. , 1999, Journal of medicinal chemistry.

[20]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[21]  M. Cronin,et al.  The Impact of variable selection on the modelling of oestrogenicity , 2005, SAR and QSAR in environmental research.

[22]  Juan M. Luco,et al.  QSAR Based on Multiple Linear Regression and PLS Methods for the Anti-HIV Activity of a Large Group of HEPT Derivatives , 1997, J. Chem. Inf. Comput. Sci..

[23]  Gerard J P van Westen,et al.  Drug Discovery Maps, a Machine Learning Model That Visualizes and Predicts Kinome–Inhibitor Interaction Landscapes , 2018, J. Chem. Inf. Model..

[24]  Richard J. Povinelli,et al.  An ensemble model of QSAR tools for regulatory risk assessment , 2016, Journal of Cheminformatics.

[25]  Tomasz Puzyn,et al.  Multi-Objective Genetic Algorithm (MOGA) As a Feature Selecting Strategy in the Development of Ionic Liquids' Quantitative Toxicity-Toxicity Relationship Models , 2018, J. Chem. Inf. Model..

[26]  Hao Zhu,et al.  CIIPro: a new read‐across portal to fill data gaps using public large‐scale chemical and biological data , 2016, Bioinform..

[27]  Marc C. Nicklaus,et al.  QSAR Modeling and Prediction of Drug-Drug Interactions. , 2016, Molecular pharmaceutics.

[28]  Jianyu Long,et al.  A Novel Sparse Echo Autoencoder Network for Data-Driven Fault Diagnosis of Delta 3-D Printers , 2020, IEEE Transactions on Instrumentation and Measurement.

[29]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[30]  Frank R. Burden,et al.  Use of Automatic Relevance Determination in QSAR Studies Using Bayesian Neural Networks , 2000, J. Chem. Inf. Comput. Sci..

[31]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[32]  Eric J. Martin,et al.  Profile-QSAR: A Novel meta-QSAR Method that Combines Activities across the Kinase Family To Accurately Predict Affinity, Selectivity, and Cellular Activity , 2011, J. Chem. Inf. Model..

[33]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..