A novel adaptive ensemble classification framework for ADME prediction

It has now become clear that in silico prediction of ADME (absorption, distribution, metabolism, and elimination) characteristics is an important component of the drug discovery process. Therefore, there has been considerable interest in the development of in silico modeling of ADME prediction in recent years. Despite the advances in this field, there remains challenges when facing the unbalanced and high dimensionality problems simultaneously. In this work, we introduce a novel adaptive ensemble classification framework named as AECF to deal with the above issues. AECF includes four components which are (1) data balancing, (2) generating individual models, (3) combining individual models, and (4) optimizing the ensemble. We considered five sampling methods, seven base modeling techniques, and ten ensemble rules to build a choice pool. The proper route of constructing predictive models was determined automatically according to the imbalance ratio (IR). With the adaptive characteristics of AECF, it can be used to work on the different kinds of ADME data, and the balanced data is a special case in AECF. We evaluated the performance of our approach using five extensive ADME datasets concerning Caco-2 cell permeability (CacoP), human intestinal absorption (HIA), oral bioavailability (OB), and P-glycoprotein (P-gp) binders (substrates/inhibitors, PS/PI). The performance of AECF was evaluated on two independent datasets, and the average AUC values were 0.8574–0.8602, 0.8968–0.9182, 0.7821–0.7981, 0.8139–0.8311, and 0.8874–0.8898 for CacoP, HIA, OB, PS and PI, respectively. Our results show that AECF can provide better performance and generality compared with individual models and two representative ensemble methods bagging and boosting. Furthermore, the degree of complementarity among the AECF ensemble members was investigated for the purpose of elucidating the potential advantages of our framework. We found that AECF can effectively select complementary members to construct predictive models by our auto-adaptive optimization approach, and the additional diversity in both sample and feature space mainly contribute to the complementarity of ensemble members.

[1]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery, 6. Can Oral Bioavailability in Humans Be Effectively Predicted by Simple Molecular Property-Based Rules? , 2007, J. Chem. Inf. Model..

[2]  Roberto Todeschini,et al.  Comparison of Different Approaches to Define the Applicability Domain of QSAR Models , 2012, Molecules.

[3]  M. Bermejo,et al.  In Silico Prediction of Caco‐2 Cell Permeability by a Classification QSAR Approach , 2011, Molecular informatics.

[4]  Hai Pham-The,et al.  The Use of Rule‐Based and QSPR Approaches in ADME Profiling: A Case Study on Caco‐2 Permeability , 2013, Molecular informatics.

[5]  Hong Wang,et al.  Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble , 2015, PloS one.

[6]  Matthew D. Segall,et al.  Gaussian Processes for Classification: QSAR Modeling of ADMET and Target Activity , 2010, J. Chem. Inf. Model..

[7]  Rok Blagus,et al.  Class prediction for high-dimensional class-imbalanced data , 2010, BMC Bioinformatics.

[8]  Wei Li,et al.  nsemble-based hybrid probabilistic sampling for imbalanced data earning in lung nodule CAD , 2014 .

[9]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[10]  Dan Li,et al.  ADMET evaluation in drug discovery. 13. Development of in silico prediction models for P-glycoprotein substrates. , 2014, Molecular pharmaceutics.

[11]  Alex Alves Freitas,et al.  Pre-processing Feature Selection for Improved C&RT Models for Oral Absorption , 2013, J. Chem. Inf. Model..

[12]  James J. Chen,et al.  Class-imbalanced classifiers for high-dimensional data , 2013, Briefings Bioinform..

[13]  D. Kell,et al.  Improving the interpretation of multivariate and rule induction models by using a peak parameter representation , 1997 .

[14]  Ling Yang,et al.  Classification of Substrates and Inhibitors of P-Glycoprotein Using Unsupervised Machine Learning Approach , 2005, J. Chem. Inf. Model..

[15]  Alex Alves Freitas,et al.  Coping with Unbalanced Class Data Sets in Oral Absorption Models , 2013, J. Chem. Inf. Model..

[16]  Luís Torgo,et al.  Data Mining with R: Learning with Case Studies , 2010 .

[17]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[18]  Timothy M. D. Ebbels,et al.  Genetic algorithms for simultaneous variable and sample selection in metabonomics , 2009, Bioinform..

[19]  Yojiro Sakiyama,et al.  The use of machine learning and nonlinear statistical tools for ADME prediction , 2009 .

[20]  Raymond T. Ng,et al.  A Model-Based Ensembling Approach for Developing QSARs , 2009, J. Chem. Inf. Model..

[21]  Dong-Sheng Cao,et al.  Predicting human intestinal absorption with modified random forest approach: a comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues , 2017 .

[22]  Juan José Rodríguez Diez,et al.  Diversity techniques improve the performance of the best imbalance learning ensembles , 2015, Inf. Sci..

[23]  Andreas Bender,et al.  P-glycoprotein Substrate Models Using Support Vector Machines Based on a Comprehensive Data set , 2011, J. Chem. Inf. Model..

[24]  Shikha Gupta,et al.  Predicting human intestinal absorption of diverse chemicals using ensemble learning based QSAR modeling approaches , 2016, Comput. Biol. Chem..

[25]  Hongmao Sun A naive bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing. , 2005, Journal of medicinal chemistry.

[26]  Xin Chen,et al.  Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents , 2004, J. Chem. Inf. Model..

[27]  Juan José Rodríguez Diez,et al.  Random Balance: Ensembles of variable priors classifiers for imbalanced data , 2015, Knowl. Based Syst..

[28]  Yuming Zhou,et al.  A novel ensemble method for classifying imbalanced data , 2015, Pattern Recognit..

[29]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery, 7. Prediction of Oral Absorption by Correlation and Classification , 2007, J. Chem. Inf. Model..

[30]  Li-Bin Liu,et al.  An adaptive moving grid method for a system of singularly perturbed initial value problems , 2015, J. Comput. Appl. Math..

[31]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[32]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[33]  Francisco Torrens,et al.  Estimation of ADME properties in drug discovery: predicting Caco-2 cell permeability using atom-based stochastic and non-stochastic linear indices. , 2007, Journal of pharmaceutical sciences.

[34]  Vasanthanathan Poongavanam,et al.  Fingerprint-based in silico models for the prediction of P-glycoprotein substrates and inhibitors , 2012, Bioorganic & medicinal chemistry.

[35]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[36]  Sheng Tian,et al.  ADME evaluation in drug discovery. 9. Prediction of oral bioavailability in humans based on molecular properties and structural fingerprints. , 2011, Molecular pharmaceutics.

[37]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[38]  Tingjun Hou,et al.  Recent developments of in silico predictions of intestinal absorption and oral bioavailability. , 2009, Combinatorial chemistry & high throughput screening.

[39]  Marlene T. Kim,et al.  Critical Evaluation of Human Oral Bioavailability for Pharmaceutical Drugs by Using Various Cheminformatics Approaches , 2013, Pharmaceutical Research.

[40]  Dan Li,et al.  ADMET Evaluation in Drug Discovery. 16. Predicting hERG Blockers by Combining Multiple Pharmacophores and Machine Learning Approaches. , 2016, Molecular pharmaceutics.

[41]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[42]  Maryam Hamzeh-Mivehroud,et al.  Quantitative structure activity relationship and docking studies of imidazole-based derivatives as P-glycoprotein inhibitors , 2012, Medicinal Chemistry Research.

[43]  J. F. Wang,et al.  Prediction of P-Glycoprotein Substrates by a Support Vector Machine Approach , 2004, J. Chem. Inf. Model..

[44]  Jun Ni,et al.  An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[45]  Youyong Li,et al.  ADMET evaluation in drug discovery. 12. Development of binary classification models for prediction of hERG potassium channel blockage. , 2012, Molecular pharmaceutics.

[46]  Lei Chen,et al.  ADME evaluation in drug discovery. 10. Predictions of P-glycoprotein inhibitors using recursive partitioning and naive Bayesian classification techniques. , 2011, Molecular pharmaceutics.

[47]  Tengfei Liu,et al.  Combined QSAR and molecule docking studies on predicting P-glycoprotein inhibitors , 2013, Journal of Computer-Aided Molecular Design.

[48]  N. Campillo,et al.  Neural computational prediction of oral drug absorption based on CODES 2D descriptors. , 2010, European journal of medicinal chemistry.

[49]  Ming Yang,et al.  Development of in Silico Models for Predicting P-Glycoprotein Inhibitors Based on a Two-Step Approach for Feature Selection and Its Application to Chinese Herbal Medicine Screening. , 2015, Molecular pharmaceutics.

[50]  Gerhard F. Ecker,et al.  Ligand and Structure-Based Classification Models for Prediction of P-Glycoprotein Inhibitors , 2013, J. Chem. Inf. Model..

[51]  Hai Pham-The,et al.  Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling , 2016, Molecular Diversity.

[52]  I. Poggesi,et al.  Computational models for identifying potential P-glycoprotein substrates and inhibitors. , 2006, Molecular pharmaceutics.

[53]  Jie Shen,et al.  admetSAR: A Comprehensive Source and Free Tool for Assessment of Chemical ADMET Properties , 2012, J. Chem. Inf. Model..

[54]  Jörg Huwyler,et al.  Combinatorial QSAR modeling of human intestinal absorption. , 2011, Molecular pharmaceutics.

[55]  Francesco Falciani,et al.  GALGO: an R package for multivariate variable selection using genetic algorithms , 2006, Bioinform..

[56]  K Gubernator,et al.  Physicochemical high throughput screening: parallel artificial membrane permeation assay in the description of passive absorption processes. , 1998, Journal of medicinal chemistry.

[57]  Michael C. Lee,et al.  Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction , 2010, Artif. Intell. Medicine.

[58]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Tudor I. Oprea,et al.  A novel approach for predicting P-glycoprotein (ABCB1) inhibition using molecular interaction fields. , 2011, Journal of medicinal chemistry.

[60]  José Salvador Sánchez,et al.  On the effectiveness of preprocessing methods when dealing with different levels of class imbalance , 2012, Knowl. Based Syst..

[61]  Tingjun Hou,et al.  Advances in computationally modeling human oral bioavailability. , 2015, Advanced drug delivery reviews.

[62]  Sichao Wang,et al.  Recent developments in computational prediction of HERG blockage. , 2013, Current topics in medicinal chemistry.

[63]  A. Zeileis Econometric Computing with HC and HAC Covariance Matrix Estimators , 2004 .

[64]  Andrés Olivares-Morales,et al.  The Use of ROC Analysis for the Qualitative Prediction of Human Oral Bioavailability from Animal Data , 2013, Pharmaceutical Research.

[65]  Chun Wei Yap,et al.  Determination of torsade-causing potential of drug candidates using one-class classification and ensemble modelling approaches. , 2012, Current drug safety.

[66]  V. Ramakrishnan,et al.  Systems Biological Approach of Molecular Descriptors Connectivity: Optimal Descriptors for Oral Bioavailability Prediction , 2012, PloS one.

[67]  Berith F. Jensen,et al.  In silico prediction of membrane permeability from calculated molecular parameters. , 2005, Journal of medicinal chemistry.

[68]  Alexander Golbraikh,et al.  Combinatorial QSAR Modeling of P-Glycoprotein Substrates , 2006, J. Chem. Inf. Model..

[69]  Juan José Rodríguez Diez,et al.  A weighted voting framework for classifiers ensembles , 2012, Knowledge and Information Systems.

[70]  Jie Shen,et al.  Estimation of ADME Properties with Substructure Pattern Recognition , 2010, J. Chem. Inf. Model..