In Silico Prediction of Blood–Brain Barrier Permeability of Compounds by Machine Learning and Resampling Methods

The blood–brain barrier (BBB) as a part of absorption protects the central nervous system by separating the brain tissue from the bloodstream. In recent years, BBB permeability has become a critical issue in chemical ADMET prediction, but almost all models were built using imbalanced data sets, which caused a high false‐positive rate. Therefore, we tried to solve the problem of biased data sets and built a reliable classification model with 2358 compounds. Machine learning and resampling methods were used simultaneously for the refinement of models with both 2 D molecular descriptors and molecular fingerprints to represent the chemicals. Through a series of evaluation, we realized that resampling methods such as Synthetic Minority Oversampling Technique (SMOTE) and SMOTE+edited nearest neighbor could effectively solve the problem of imbalanced data sets and that MACCS fingerprint combined with support vector machine performed the best. After the final construction of a consensus model, the overall accuracy rate was increased to 0.966 for the final external data set. Also, the accuracy rate of the model for the test set was 0.919, with an excellent balanced capacity of 0.925 (sensitivity) to predict BBB‐positive compounds and of 0.899 (specificity) to predict BBB‐negative compounds. Compared with other BBB classification models, our models reduced the rate of false positives and were more robust in prediction of BBB‐positive as well as BBB‐negative compounds, which would be quite helpful in early drug discovery.

[1]  Varun Khanna,et al.  Structural diversity of biologically interesting datasets: a scaffold analysis approach , 2011, J. Cheminformatics.

[2]  Jörg Huwyler,et al.  Computational Prediction of Blood-Brain Barrier Permeability Using Decision Tree Induction , 2012, Molecules.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Jie Shen,et al.  admetSAR: A Comprehensive Source and Free Tool for Assessment of Chemical ADMET Properties , 2012, J. Chem. Inf. Model..

[5]  M. Nedergaard,et al.  The blood–brain barrier: an overview Structure, regulation, and clinical implications , 2004, Neurobiology of Disease.

[6]  K. Roy,et al.  How important is to detect systematic error in predictions and understand statistical applicability domain of QSAR models , 2017 .

[7]  Lei Yang,et al.  Classification of Cytochrome P450 Inhibitors and Noninhibitors Using Combined Classifiers , 2011, J. Chem. Inf. Model..

[8]  Rok Blagus,et al.  SMOTE for high-dimensional class-imbalanced data , 2013, BMC Bioinformatics.

[9]  Jie Shen,et al.  Estimation of ADME Properties with Substructure Pattern Recognition , 2010, J. Chem. Inf. Model..

[10]  H. J. Mclaughlin,et al.  Learn , 2002 .

[11]  Felice C Lightstone,et al.  A method to predict blood-brain barrier permeability of drug-like compounds using molecular dynamics simulations. , 2014, Biophysical journal.

[12]  Roberto Todeschini,et al.  Comparison of Different Approaches to Define the Applicability Domain of QSAR Models , 2012, Molecules.

[13]  Rok Blagus,et al.  Improved shrunken centroid classifiers for high-dimensional class-imbalanced data , 2013, BMC Bioinformatics.

[14]  Philippe Renevey,et al.  SVM-based recursive feature elimination to compare phase synchronization computed from broadband and narrowband EEG signals in Brain-Computer Interfaces , 2005, Signal Process..

[15]  S. Williams,et al.  Pearson's correlation coefficient. , 1996, The New Zealand medical journal.

[16]  Alexander Golbraikh,et al.  QSAR Modeling of the Blood–Brain Barrier Permeability for Diverse Organic Compounds , 2008, Pharmaceutical Research.

[17]  Hongbin Yang,et al.  In Silico Prediction of Chemicals Binding to Aromatase with Machine Learning Methods. , 2017, Chemical research in toxicology.

[18]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[19]  H. van de Waterbeemd,et al.  ADMET in silico modelling: towards prediction paradise? , 2003, Nature reviews. Drug discovery.

[20]  Zhen Gao,et al.  Predict drug permeability to blood‐brain‐barrier from clinical phenotypes: drug side effects and drug indications , 2016, Bioinform..

[21]  E. Hansson,et al.  Astrocyte–endothelial interactions at the blood–brain barrier , 2006, Nature Reviews Neuroscience.

[22]  Li Di,et al.  Comparison of blood-brain barrier permeability assays: in situ brain perfusion, MDR1-MDCKII and PAMPA-BBB. , 2009, Journal of pharmaceutical sciences.

[23]  R. J. Doerksen,et al.  Topological polar surface area: a useful descriptor in 2D-QSAR. , 2009, Current medicinal chemistry.

[24]  Klaus R. Liedl,et al.  Qualitative prediction of blood–brain barrier permeability on a large and refined dataset , 2011, J. Comput. Aided Mol. Des..

[25]  Katsumi Inoue,et al.  Relational Reinforcement Learning for Planning with Exogenous Effects , 2017 .

[26]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[27]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[28]  Luis Pinheiro,et al.  A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modeling , 2012, J. Chem. Inf. Model..

[29]  Mark L. Lewis,et al.  Predicting Penetration Across the Blood-Brain Barrier from Simple Descriptors and Fragmentation Schemes , 2007, J. Chem. Inf. Model..

[30]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[31]  William Stafford Noble,et al.  Support vector machine , 2013 .

[32]  Mahmud Tareq Hassan Khan,et al.  Predictions of the ADMET properties of candidate drug molecules utilizing different QSAR/QSPR modelling approaches. , 2010, Current drug metabolism.

[33]  Dolores Diaz,et al.  Safety Lead Optimization and Candidate Identification: Integrating New Technologies into Decision-Making. , 2016, Chemical research in toxicology.

[34]  Yvan Vander Heyden,et al.  Benchmarking of QSAR Models for Blood-Brain Barrier Permeation , 2007, J. Chem. Inf. Model..

[35]  T. Cai,et al.  Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve , 2006, Biometrics.

[36]  Cédric Merlot,et al.  Computational toxicology--a tool for early safety evaluation. , 2010, Drug discovery today.

[37]  Carsten Wiuf,et al.  Large Scale Identification and Categorization of Protein Sequences Using Structured Logistic Regression , 2014, PLoS ONE.

[38]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[39]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[40]  Nuria E. Campillo,et al.  Artificial Neural Networks in ADMET Modeling: Prediction of Blood–Brain Barrier Permeation , 2008 .

[41]  Marlene T. Kim,et al.  Developing Enhanced Blood–Brain Barrier Permeability Models: Integrating External Bio-Assay Data in QSAR Modeling , 2015, Pharmaceutical Research.

[42]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[43]  Tingjun Hou,et al.  ADME evaluation in drug discovery , 2002, Journal of molecular modeling.

[44]  Andrey A Toropov,et al.  QSAR model for blood-brain barrier permeation. , 2017, Journal of pharmacological and toxicological methods.