Predicting human intestinal absorption with modified random forest approach: a comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues

With the increase of complexity and risk in drug discovery processes, human intestinal absorption (HIA) prediction has become more and more important. Up to now, some predictive models have been constructed to estimate HIA of new drug-like compounds with acceptable accuracies, but there are still some issues to be explored including the limited and unbalanced HIA data, the performance of different types of descriptors and the application domain issues of published models. To address these problems, in this study, we collected a relatively large dataset consisting of 970 compounds, and 9 different types of descriptors were calculated for further modeling. For all the modeling processes, a parameter named samplesize in the random forest (RF) method was applied to balance the dataset. And then, classification models were established based on different training sets and different combinations of descriptors. After a series of modeling processes and various comparisons among these statistical results, we explored the aforementioned problems and evaluated the reliabilities of existing HIA classification models and subsequently obtained a robust and applicable model based on a combination of 2D, 3D, N+ and Nrule-of-five (for the training set, SE = 0.892, SP = 0.846; for the test set, SE = 0.877, SP = 0.813). Compared with other published models, our model exhibits some advantages in data size, model accuracy and model practicability to some extent. This structure–activity relationship model is necessary and useful for HIA prediction and it could be a convenient tool for virtual screening in the early stage of drug development.

[1]  Y Vander Heyden,et al.  Evaluation of chromatographic descriptors for the prediction of gastro-intestinal absorption of drugs. , 2007, Journal of chromatography. A.

[2]  T. Kennedy Managing the drug discovery/development interface , 1997 .

[3]  Sitarama B. Gunturi,et al.  In Silico ADME Modeling 3: Computational Models to Predict Human Intestinal Absorption Using Sphere Exclusion and kNN QSAR Methods , 2007 .

[4]  P. Artursson,et al.  Correlation between oral drug absorption in humans and apparent drug permeability coefficients in human intestinal epithelial (Caco-2) cells. , 1991, Biochemical and biophysical research communications.

[5]  Kristina Luthman,et al.  Caco-2 monolayers in experimental and theoretical predictions of drug transport1PII of original article: S0169-409X(96)00415-2. The article was originally published in Advanced Drug Delivery Reviews 22 (1996) 67–84.1 , 2001 .

[6]  Anne Hersey,et al.  Rate-Limited Steps of Human Oral Absorption and QSAR Studies , 2002, Pharmaceutical Research.

[7]  G Beck,et al.  Evaluation of human intestinal absorption data and subsequent derivation of a quantitative structure-activity relationship (QSAR) with the Abraham descriptors. , 2001, Journal of pharmaceutical sciences.

[8]  Sheng Tian,et al.  ADME evaluation in drug discovery. 9. Prediction of oral bioavailability in humans based on molecular properties and structural fingerprints. , 2011, Molecular pharmaceutics.

[9]  Dong-Sheng Cao,et al.  ADME Properties Evaluation in Drug Discovery: Prediction of Caco-2 Cell Permeability Using a Combination of NSGA-II and Boosting , 2016, J. Chem. Inf. Model..

[10]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery, 7. Prediction of Oral Absorption by Correlation and Classification , 2007, J. Chem. Inf. Model..

[11]  G Mannens,et al.  Strategies for absorption screening in drug discovery and development. , 2001, Current topics in medicinal chemistry.

[12]  A. Nagel,et al.  Macrolide antibiotics. Chemistry, biology, and practice , 1985 .

[13]  K. Luthman,et al.  Caco-2 monolayers in experimental and theoretical predictions of drug transport , 1996 .

[14]  Frank R. Burden,et al.  Predictive Human Intestinal Absorption QSAR Models Using Bayesian Regularized Neural Networks , 2005 .

[15]  J. Tolan,et al.  MDCK (Madin-Darby canine kidney) cells: A tool for membrane permeability screening. , 1999, Journal of pharmaceutical sciences.

[16]  Maykel Pérez González,et al.  A topological sub-structural approach for predicting human intestinal absorption of drugs. , 2004, European journal of medicinal chemistry.

[17]  Dong-Sheng Cao,et al.  ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation , 2015, Journal of Cheminformatics.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Miklos Feher,et al.  Rapid Prediction of Human Intestinal Absorption , 2002 .

[20]  Shikha Gupta,et al.  Predicting human intestinal absorption of diverse chemicals using ensemble learning based QSAR modeling approaches , 2016, Comput. Biol. Chem..

[21]  A. Talevi,et al.  Prediction of drug intestinal absorption by new linear and non-linear QSPR. , 2011, European journal of medicinal chemistry.

[22]  M. Varma,et al.  Functional role of P-glycoprotein in limiting intestinal absorption of drugs: contribution of passive permeability to P-glycoprotein mediated efflux transport. , 2005, Molecular pharmaceutics.

[23]  Taravat Ghafourian,et al.  The impact of training set data distributions for modelling of passive intestinal absorption. , 2012, International journal of pharmaceutics.

[24]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery, 8. The Prediction of Human Intestinal Absorption by a Support Vector Machine , 2007, J. Chem. Inf. Model..

[25]  Qingsong Xu,et al.  Computer‐aided prediction of toxicity with substructure pattern and random forest , 2012 .

[26]  Dong-Sheng Cao,et al.  The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis. , 2016, The Analyst.

[27]  Aixia Yan,et al.  Prediction of Human Intestinal Absorption by GA Feature Selection and Support Vector Machine Regression , 2008, International journal of molecular sciences.

[28]  Dong-Sheng Cao,et al.  BioTriangle: a web-accessible platform for generating various molecular representations for chemicals, proteins, DNAs/RNAs and their interactions , 2016, Journal of Cheminformatics.

[29]  N. Campillo,et al.  Neural computational prediction of oral drug absorption based on CODES 2D descriptors. , 2010, European journal of medicinal chemistry.

[30]  Zhiyong Lu,et al.  The CHEMDNER corpus of chemicals and drugs and its annotation principles , 2015, Journal of Cheminformatics.

[31]  Tingjun Hou,et al.  Recent development and application of virtual screening in drug discovery: an overview. , 2004, Current pharmaceutical design.

[32]  Anna Forsby,et al.  The Integrated Acute Systemic Toxicity Project (ACuteTox) for the Optimisation and Validation of Alternative In Vitro Tests , 2007, Alternatives to laboratory animals : ATLA.

[33]  T. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[34]  Wei Zhang,et al.  Recent advances in computational prediction of drug absorption and permeability in drug discovery. , 2006, Current medicinal chemistry.

[35]  S Agatonovic-Kustrin,et al.  Theoretically-derived molecular descriptors important in human intestinal absorption. , 2001, Journal of pharmaceutical and biomedical analysis.

[36]  Dong-Sheng Cao,et al.  In silico evaluation of logD7.4 and comparison with other prediction methods , 2015 .

[37]  Gilles Klopman,et al.  ADME evaluation. 2. A computer model for the prediction of intestinal absorption in humans. , 2002, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[38]  S. Stavchansky,et al.  Link between drug absorption solubility and permeability measurements in Caco-2 cells. , 1998, Journal of pharmaceutical sciences.

[39]  H. A. Favacho,et al.  Computational Investigation of Antifungal Compounds Using Molecular Modeling and Prediction of ADME/Tox Properties , 2015 .

[40]  Li Di,et al.  PAMPA--critical factors for better predictions of absorption. , 2007, Journal of pharmaceutical sciences.

[41]  Hans L. Hillege,et al.  Effects of Fosinopril and Pravastatin on Cardiovascular Events in Subjects With Microalbuminuria , 2004, Circulation.

[42]  Taravat Ghafourian,et al.  Decision trees to characterise the roles of permeability and solubility on the prediction of oral absorption. , 2015, European journal of medicinal chemistry.

[43]  Andreas Klamt,et al.  Use of Surface Charges from DFT Calculations To Predict Intestinal Absorption , 2005, J. Chem. Inf. Model..

[44]  Jörg Huwyler,et al.  Combinatorial QSAR modeling of human intestinal absorption. , 2011, Molecular pharmaceutics.

[45]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[46]  Claude Roques,et al.  Correlation Between Oral Drug Absorption in Humans, and Apparent Drug Permeability in TC-7 Cells, A Human Epithelial Intestinal Cell Line: Comparison with the Parental Caco-2 Cell Line , 1998, Pharmaceutical Research.

[47]  Jie Dong,et al.  TargetNet: a web service for predicting potential drug–target interaction profiling via multi-target SAR models , 2016, Journal of Computer-Aided Molecular Design.

[48]  Bruce L. Booth,et al.  Opinion: Prospects for productivity , 2004, Nature Reviews Drug Discovery.

[49]  Dong-Sheng Cao,et al.  In silico toxicity prediction of chemicals from EPA toxicity database by kernel fusion-based support vector machines , 2015 .

[50]  Jie Shen,et al.  Estimation of ADME Properties with Substructure Pattern Recognition , 2010, J. Chem. Inf. Model..

[51]  Roman Szucs,et al.  In vitro prediction of human intestinal absorption and blood–brain barrier partitioning: development of a lipid analog for micellar liquid chromatography , 2015, Analytical and Bioanalytical Chemistry.