Prediction of CYP450 Enzyme-Substrate Selectivity Based on the Network-Based Label Space Division Method

A drug may be metabolized by multiple CYP450 isoforms. Predicting the metabolic fate of drugs is very important to prevent drug-drug interactions in the development of novel pharmaceuticals. Prediction of CYP450 enzyme-substrate selectivity is formulized as a multi-label learning task in this study. Firstly, we compared the performance of feature combinations based on 4 different categories of features which are physiochemical property descriptors (PC), mol2vec descriptors (M2V), Extended Connectivity Fingerprints (ECFP) and Molecular ACCess System (MACCS) keys fingerprints on modeling. After identifying the best combination of features, we applied 7 different multi-label models which are ML-kNN, MLTSVM and 5 Network-based Label Space Division (NLSD)-based methods (NLSD-MLP, NLSD-XGB, NLSD-EXT, NLSD-RF, NLSD-SVM). The six models (ML-kNN, NLSD-MLP, NLSD-XGB, NLSD-EXT, NLSD-RF, NLSD-SVM) in this paper all produce better performances than the previous work. Besides, NLSD-XGB achieves the best performance with the average top-1 prediction success of 91.1%, the average top-2 prediction success of 96.2%, and the average top-3 prediction success of 98.2%. When compared with the previous work, NLSD-XGB shows a significant improvement over 11% on top-1 in 10 times repeated 5-fold cross-validation test and over 14% on top-1 in 10 times repeated hold-out method. To the best of our knowledge, the Network-based Label Space Division model is firstly introduced in drug metabolism and performs well in this task.

[1]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Sabrina Jaeger,et al.  Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition , 2018, J. Chem. Inf. Model..

[3]  Jonathan D. Tyzack,et al.  WhichP450: a multi-class categorical model to predict the major metabolising CYP450 isoform for a compound , 2018, Journal of Computer-Aided Molecular Design.

[4]  Yong Huang,et al.  Identifying Multi-Functional Enzyme by Hierarchical Multi-Label Classifier , 2013 .

[5]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[6]  Jun-yan Hong,et al.  Genetic polymorphism of cytochrome P450 as a biomarker of susceptibility to environmental toxicity. , 1997, Environmental health perspectives.

[7]  Zhengyou Zhang,et al.  Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[8]  Francisco Charte,et al.  Multilabel Classification , 2016, Springer International Publishing.

[9]  M. C. Feiters,et al.  From simple to supramolecular cytochrome P450 mimics , 2000 .

[10]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[11]  Weihua Li,et al.  In silico estimation of chemical aquatic toxicity on crustaceans using chemical category methods. , 2018, Environmental science. Processes & impacts.

[12]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[13]  Guozheng Li,et al.  Modelling of inquiry diagnosis for coronary heart disease in traditional Chinese medicine by using multi-label learning , 2010, BMC complementary and alternative medicine.

[14]  Tao Zhang,et al.  Classification Models for Predicting Cytochrome P450 Enzyme‐Substrate Selectivity , 2012, Molecular informatics.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  M. Klingenberg Pigments of rat liver microsomes. , 2003, Archives of biochemistry and biophysics.

[17]  Krzysztof J. Cios,et al.  Review of ensembles of multi-label classifiers: Models, experimental study and prospects , 2018, Inf. Fusion.

[18]  Wen Zhang,et al.  Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods , 2017, BMC Bioinformatics.

[19]  K. Bremer,et al.  BRANCH SUPPORT AND TREE STABILITY , 1994 .

[20]  V. Lyakhovich,et al.  Association of cytochrome P450 genetic polymorphisms with neoadjuvant chemotherapy efficacy in breast cancer patients , 2012, BMC Medical Genetics.

[21]  Jonathan D. Tyzack,et al.  Predicting Regioselectivity and Lability of Cytochrome P450 Metabolism Using Quantum Mechanical Simulations , 2016, J. Chem. Inf. Model..

[22]  Jianfeng Pei,et al.  Prediction of Human Cytochrome P450 Inhibition Using a Multitask Deep Autoencoder Neural Network. , 2018, Molecular pharmaceutics.

[23]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[24]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[25]  Daisuke Kihara,et al.  Survey of Machine Learning Techniques for Prediction of the Isoform Specificity of Cytochrome P450 Substrates. , 2019, Current drug metabolism.

[26]  Reshma Khemchandani,et al.  Twin Support Vector Machines for Pattern Classification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Michael F. Hoffmann,et al.  Polymorphic Cytochrome P450 Enzymes (CYPs) and Their Role in Personalized Therapy , 2013, PloS one.

[28]  Zhenyu Xu,et al.  ATC-NLSP: Prediction of the Classes of Anatomical Therapeutic Chemicals Using a Network-Based Label Space Partition Method , 2019, Front. Pharmacol..

[29]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[30]  Bergsma,et al.  A bias-correction for Cramér ’ s V and Tschuprow ’ s T Wicher , 2012 .

[31]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[32]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[33]  Roger A. Sayle,et al.  Comparing structural fingerprints using a literature-based similarity benchmark , 2016, Journal of Cheminformatics.

[34]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[35]  Feng Liu,et al.  Predicting drug side effects by multi-label learning and ensemble learning , 2015, BMC Bioinformatics.

[36]  Sebastián Ventura,et al.  Multi‐label learning: a review of the state of the art and ongoing research , 2014, WIREs Data Mining Knowl. Discov..

[37]  Vladimir B Bajic,et al.  In silico toxicology: comprehensive benchmarking of multi‐label classification methods applied to chemical toxicity data , 2018, Wiley interdisciplinary reviews. Computational molecular science.

[38]  Larry D. Hostetler,et al.  Optimization of k nearest neighbor density estimates , 1973, IEEE Trans. Inf. Theory.

[39]  Yuan-Hai Shao,et al.  MLTSVM: A novel twin support vector machine to multi-label learning , 2016, Pattern Recognit..