In silico prediction of cellular permeability of diverse chemicals using qualitative and quantitative SAR modeling approaches

Abstract The permeability of the molecules through the cultured Caco-2 cells is an established in vitro method for the assessment of the absorption of oral drugs. The computational approach for predicting the cellular permeability of molecules may potentiate the screening of new drugs. In this study, gradient boosted tree (GBT) approach based qualitative and quantitative structure–activity relationship (SAR) models have been established for binary classification (moderate–poor and highly permeable) and permeability prediction of molecules using the Caco-2 cell dataset. The structural diversity of the chemicals and nonlinear structure in the considered data were tested by the similarity index and Brock–Dechert–Scheinkman statistics. The external predictive power of the developed SAR models was evaluated through the internal and external validation procedures recommended in QSAR literature. In complete data, the qualitative SAR model rendered classification accuracy of 99.26%, while the quantitative SAR model yielded a correlation (R 2 ) of 0.917 between the measured and predicted permeability values with the mean squared error (MSE) of 0.08. The results suggest for the appropriateness of the developed SAR models to reliably predict the cellular permeability of diverse chemicals in Caco-2 cells and can be useful tools for initial screening of molecules in the drug development process.

[1]  X. Y. Zhang,et al.  Application of support vector machine (SVM) for prediction toxic activity of different data sets. , 2006, Toxicology.

[2]  Ralph Kühne,et al.  External Validation and Prediction Employing the Predictive Squared Correlation Coefficient Test Set Activity Mean vs Training Set Activity Mean , 2008, J. Chem. Inf. Model..

[3]  Shikha Gupta,et al.  In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches. , 2014, Toxicology and applied pharmacology.

[4]  C W Yap,et al.  Classification of a diverse set of Tetrahymena pyriformis toxicity chemical compounds from molecular descriptors by statistical learning methods. , 2006, Chemical research in toxicology.

[5]  Riccardo Leonardi,et al.  Caco-2 cell permeability modelling: a neural network coupled genetic algorithm approach , 2007, J. Comput. Aided Mol. Des..

[6]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery. 5. Correlation of Caco-2 Permeation with Simple Molecular Properties , 2004, J. Chem. Inf. Model..

[7]  Roberto Todeschini,et al.  Comments on the Definition of the Q2 Parameter for QSAR Validation , 2009, J. Chem. Inf. Model..

[8]  Y Vander Heyden,et al.  Orthogonal chromatographic descriptors for modelling Caco-2 drug permeability. , 2012, Journal of chromatographic science.

[9]  Q. Ping,et al.  Transport of leuprolide across rat intestine, rabbit intestine and Caco-2 cell monolayer. , 2004, International journal of pharmaceutics.

[10]  Emilio Benfenati,et al.  The Expanding Role of Predictive Toxicology: An Update on the (Q)SAR Models for Mutagens and Carcinogens , 2007, Journal of environmental science and health. Part C, Environmental carcinogenesis & ecotoxicology reviews.

[11]  Paulo Paixão,et al.  Prediction of the in vitro permeability determined in Caco-2 cells by using artificial neural networks. , 2010, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[12]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[13]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[16]  Dirk Van den Poel,et al.  Handling class imbalance in customer churn prediction , 2009, Expert Syst. Appl..

[17]  Jerzy Leszczynski,et al.  Using nano-QSAR to predict the cytotoxicity of metal oxide nanoparticles. , 2011, Nature nanotechnology.

[18]  Francisco Torrens,et al.  Estimation of ADME properties in drug discovery: predicting Caco-2 cell permeability using atom-based stochastic and non-stochastic linear indices. , 2007, Journal of pharmaceutical sciences.

[19]  Maria Guangli,et al.  Predicting Caco-2 permeability using support vector machine and chemistry development kit. , 2006, Journal of pharmacy & pharmaceutical sciences : a publication of the Canadian Society for Pharmaceutical Sciences, Societe canadienne des sciences pharmaceutiques.

[20]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[21]  Nipa Shah,et al.  Biopharmaceutics classification system: validation and learnings of an in vitro permeability assay. , 2009, Molecular pharmaceutics.

[22]  K. P. Singh,et al.  Support vector machines in water quality management. , 2011, Analytica chimica acta.

[23]  R. Saracci,et al.  Describing the validity of carcinogen screening tests. , 1979, British Journal of Cancer.

[24]  Lei Wang,et al.  QSPR Study of the Absorption Maxima of Azobenzene Dyes , 2011 .

[25]  A. D. L. Nuez,et al.  Current methodology for the assessment of ADME-Tox properties on drug candidate molecules , 2008 .

[26]  Ann M Richard,et al.  A novel approach: chemical relational databases, and the role of the ISSCAN database on assessing chemical carcinogenicity. , 2008, Annali dell'Istituto superiore di sanita.

[27]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[28]  Gary W Caldwell,et al.  ADME optimization and toxicity assessment in early- and late-phase drug discovery. , 2009, Current topics in medicinal chemistry.

[29]  Emmanuel Anoruo,et al.  Testing for Linear and Nonlinear Causality between Crude Oil Price Changes and Stock Market Returns , 2012 .

[30]  Michael Wink,et al.  Uptake of S-(3-Amino-3-oxopropyl)-cysteine by Caco-2 Cells , 2008, Zeitschrift fur Naturforschung. C, Journal of biosciences.

[31]  Paola Gramatica,et al.  Real External Predictivity of QSAR Models: How To Evaluate It? Comparison of Different Validation Criteria and Proposal of Using the Concordance Correlation Coefficient , 2011, J. Chem. Inf. Model..

[32]  Premanjali Rai,et al.  Predicting carcinogenicity of diverse chemicals using probabilistic neural network modeling approaches. , 2013, Toxicology and applied pharmacology.

[33]  Z. Değim,et al.  Prediction of Permeability Coefficients of Compounds Through Caco-2 Cell Monolayer Using Artificial Neural Network Analysis , 2005, Drug development and industrial pharmacy.

[34]  Emilio Benfenati,et al.  New public QSAR model for carcinogenicity , 2010, Chemistry Central journal.

[35]  T. Walle,et al.  Taxol transport by human intestinal epithelial Caco-2 cells. , 1998, Drug metabolism and disposition: the biological fate of chemicals.

[36]  Weida Tong,et al.  QSAR Models Using a Large Diverse Set of Estrogens , 2001, J. Chem. Inf. Comput. Sci..

[37]  K. Roy,et al.  On Two Novel Parameters for Validation of Predictive QSAR Models , 2009, Molecules.

[38]  T. Hancock,et al.  A performance comparison of modern statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies , 2005 .

[39]  J. Friedman Stochastic gradient boosting , 2002 .

[40]  K. Roy,et al.  Further exploring rm2 metrics for validation of QSPR models , 2011 .

[41]  M. Hashida,et al.  Prediction of Caco-2 cell permeability using a combination of MO-calculation and neural network. , 2002, International journal of pharmaceutics.

[42]  Yue Yu,et al.  In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods. , 2011, Chemosphere.

[43]  Junmei Wang,et al.  Recent advances on aqueous solubility prediction. , 2011, Combinatorial chemistry & high throughput screening.

[44]  Fumiyoshi Yamashita,et al.  Quantitative structure/property relationship analysis of Caco-2 permeability using a genetic algorithm-based partial least squares method. , 2002, Journal of pharmaceutical sciences.

[45]  Shikha Gupta,et al.  Nano-QSAR modeling for predicting biological activity of diverse nanomaterials , 2014 .

[46]  Tapas Kanungo,et al.  Predicting the readability of short web summaries , 2009, WSDM '09.

[47]  Hai Pham-The,et al.  Provisional classification and in silico study of biopharmaceutical system based on caco-2 cell permeability and dose number. , 2013, Molecular pharmaceutics.

[48]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[49]  B. LeBaron,et al.  A test for independence based on the correlation dimension , 1996 .

[50]  Shikha Gupta,et al.  Predicting acute aquatic toxicity of structurally diverse chemicals in fish using artificial intelligence approaches. , 2013, Ecotoxicology and environmental safety.

[51]  Berith F. Jensen,et al.  In silico prediction of membrane permeability from calculated molecular parameters. , 2005, Journal of medicinal chemistry.

[52]  T. Lindstrom,et al.  Characterization and application of a vinblastine-selected Caco-2 cell line for evaluation of P-glycoprotein , 2002, In Vitro Cellular & Developmental Biology - Animal.

[53]  M. Bermejo,et al.  In Silico Prediction of Caco‐2 Cell Permeability by a Classification QSAR Approach , 2011, Molecular informatics.

[54]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[55]  G Mannens,et al.  Strategies for absorption screening in drug discovery and development. , 2001, Current topics in medicinal chemistry.

[56]  Kristina Luthman,et al.  Theoretical Predictions of Drug Absorption in Drug Discovery and Development , 2002, Clinical pharmacokinetics.

[57]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[58]  Francisco Torrens,et al.  A new topological descriptors based model for predicting intestinal epithelial transport of drugs in Caco-2 cell culture. , 2004, Journal of pharmacy & pharmaceutical sciences : a publication of the Canadian Society for Pharmaceutical Sciences, Societe canadienne des sciences pharmaceutiques.

[59]  Amir Etemad-Shahidi,et al.  An alternative approach for the prediction of significant wave heights based on classification and regression trees , 2008 .

[60]  L. Lin Assay Validation Using the Concordance Correlation Coefficient , 1992 .

[61]  Ton H. Snelder,et al.  Predictive mapping of the natural flow regimes of France , 2009 .