Survey of Machine Learning Techniques for Prediction of the Isoform Specificity of Cytochrome P450 Substrates.

BACKGROUND Determination or prediction of the Absorption, Distribution, Metabolism, and Excretion (ADME) properties of drug candidates and drug-induced toxicity plays crucial roles in drug discovery and development. Metabolism is one of the most complicated pharmacokinetic properties to be understood and predicted. However, experimental determination of the substrate binding, selectivity, sites and rates of metabolism is time- and recourse- consuming. In the phase I metabolism of foreign compounds (i.e., most of drugs), cytochrome P450 enzymes play a key role. To help develop drugs with proper ADME properties, computational models are highly desired to predict the ADME properties of drug candidates, particularly for drugs binding to cytochrome P450. OBJECTIVE This narrative review aims to briefly summarize machine learning techniques used in the prediction of the cytochrome P450 isoform specificity of drug candidates. RESULTS Both single-label and multi-label classification methods have demonstrated good performance on modelling and prediction of the isoform specificity of substrates based on their quantitative descriptors. CONCLUSION This review provides a guide for researchers to develop machine learning-based methods to predict the cytochrome P450 isoform specificity of drug candidates.

[1]  Daisuke Kihara,et al.  Combined Approach of Patch-Surfer and PL-PatchSurfer for Protein-Ligand Binding Prediction in CSAR 2013 and 2014 , 2016, J. Chem. Inf. Model..

[2]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[3]  Shi-Hua Zhang,et al.  DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank , 2016, Bioinform..

[4]  Yong Huang,et al.  Identifying Multi-Functional Enzyme by Hierarchical Multi-Label Classifier , 2013 .

[5]  Wei Wang,et al.  Analysis and classification of DNA-binding sites in single-stranded and double-stranded DNA-binding proteins using protein information. , 2014, IET systems biology.

[6]  A. Sangamwar,et al.  Characterization of differences in substrate specificity among CYP1A1, CYP1A2 and CYP1B1: an integrated approach employing molecular docking and molecular dynamics simulations , 2016, Journal of molecular recognition : JMR.

[7]  Magnus Ingelman-Sundberg,et al.  The human genome project and novel aspects of cytochrome P450 research. , 2005, Toxicology and applied pharmacology.

[8]  Xiaohong Li,et al.  Feature-derived graph regularized matrix factorization for predicting drug side effects , 2018, Neurocomputing.

[9]  Andreas Bender,et al.  Metrabase: a cheminformatics and bioinformatics database for small molecule transporter data analysis and (Q)SAR modeling , 2015, Journal of Cheminformatics.

[10]  Sanjay Joshua Swamidass,et al.  RS-WebPredictor: a server for predicting CYP-mediated sites of metabolism on drug-like molecules , 2013, Bioinform..

[11]  Daisuke Kihara,et al.  Application of 3D Zernike descriptors to shape-based ligand similarity searching , 2009, J. Cheminformatics.

[12]  Feng Liu,et al.  A unified frame of predicting side effects of drugs by using linear neighborhood similarity , 2017, BMC Systems Biology.

[13]  E. Adebiyi,et al.  Inter-Species/Host-Parasite Protein Interaction Predictions Reviewed , 2018, Current bioinformatics.

[14]  Hao Lin,et al.  Identifying Antioxidant Proteins by Using Optimal Dipeptide Compositions , 2016, Interdisciplinary Sciences: Computational Life Sciences.

[15]  Daisuke Kihara,et al.  PatchSurfers: Two methods for local molecular property-based binding ligand prediction. , 2016, Methods.

[16]  L. Olsen,et al.  Prediction of cytochrome p450 mediated metabolism of designer drugs. , 2014, Current topics in medicinal chemistry.

[17]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[18]  Michael Schroeder,et al.  SuperCYP: a comprehensive database on Cytochrome P450 enzymes including a tool for analysis of CYP-drug interactions , 2009, Nucleic Acids Res..

[19]  Q. Zou,et al.  Protein Folds Prediction with Hierarchical Structured SVM , 2016 .

[20]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[21]  Johann Gasteiger,et al.  Comparison of Multilabel and Single-Label Classification Applied to the Prediction of the Isoform Specificity of Cytochrome P450 Substrates , 2009, J. Chem. Inf. Model..

[22]  Feng Liu,et al.  Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data , 2017, BMC Bioinformatics.

[23]  Yi Xiong,et al.  Prediction of Effective Drug Combinations by an Improved Naïve Bayesian Algorithm , 2018, International journal of molecular sciences.

[24]  Yuko Ito,et al.  Human CYPs involved in drug metabolism: structures, substrates and binding affinities , 2010, Expert opinion on drug metabolism & toxicology.

[25]  Wen Zhang,et al.  Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods , 2017, BMC Bioinformatics.

[26]  ChengXiang Zhai,et al.  DeepMeSH: deep semantic representation for improving large-scale MeSH indexing , 2016, Bioinform..

[27]  Wen Zhang,et al.  The linear neighborhood propagation method for predicting long non-coding RNA-protein interactions , 2018, Neurocomputing.

[28]  Yi Xiong,et al.  PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm. , 2017, Journal of theoretical biology.

[29]  Yanqing Niu,et al.  Quantitative prediction of drug side effects based on drug-related features , 2017, Interdisciplinary Sciences: Computational Life Sciences.

[30]  Prabha Garg,et al.  Selective Fusion of Heterogeneous Classifiers for Predicting Substrates of Membrane Transporters , 2017, J. Chem. Inf. Model..

[31]  Rowan Hatherley,et al.  SANCDB: a South African natural compound database , 2015, Journal of Cheminformatics.

[32]  Feng Liu,et al.  Predicting drug side effects by multi-label learning and ensemble learning , 2015, BMC Bioinformatics.

[33]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[34]  Xia Sun,et al.  Drug and Nondrug Classification Based on Deep Learning with Various Feature Selection Strategies , 2018 .

[35]  Roberto Todeschini,et al.  In Silico Prediction of Cytochrome P450-Drug Interaction: QSARs for CYP3A4 and CYP2C9 , 2016, International journal of molecular sciences.

[36]  Daisuke Kihara,et al.  Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0 , 2015, Bioinform..

[37]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[38]  Hao Lin,et al.  Predicting the Organelle Location of Noncoding RNAs Using Pseudo Nucleotide Compositions , 2017, Interdisciplinary Sciences: Computational Life Sciences.

[39]  ChengXiang Zhai,et al.  MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence , 2015, Bioinform..

[40]  Yi Xiong,et al.  GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank , 2017, bioRxiv.

[41]  Daisuke Kihara,et al.  PL-PatchSurfer: A Novel Molecular Local Surface-Based Method for Exploring Protein-Ligand Interactions , 2014, International journal of molecular sciences.

[42]  Hao Lin,et al.  Prediction of ketoacyl synthase family using reduced amino acid alphabets , 2012, Journal of Industrial Microbiology & Biotechnology.

[43]  Yong Wang,et al.  Site of metabolism prediction for six biotransformations mediated by cytochromes P450 , 2009, Bioinform..

[44]  Saskia Preissner,et al.  The Transformer database: biotransformation of xenobiotics , 2013, Nucleic Acids Res..

[45]  Chih-Jen Lin,et al.  Large-Scale Linear RankSVM , 2014, Neural Computation.

[46]  Tao Zhang,et al.  Classification Models for Predicting Cytochrome P450 Enzyme‐Substrate Selectivity , 2012, Molecular informatics.

[47]  Junfeng Xia,et al.  Exploiting a Reduced Set of Weighted Average Features to Improve Prediction of DNA-Binding Residues from 3D Structures , 2011, PloS one.

[48]  Hua Zou,et al.  Predicting potential side effects of drugs by recommender methods and ensemble learning , 2016, Neurocomputing.

[49]  Tao Zhang,et al.  Mutation probability of cytochrome P450 based on a genetic algorithm and support vector machine , 2011, Biotechnology journal.

[50]  Johann Gasteiger,et al.  Ligand-Based Models for the Isoform Specificity of Cytochrome P450 3A4, 2D6, and 2C9 Substrates , 2007, J. Chem. Inf. Model..

[51]  Yi Xiong,et al.  A Hadoop-Based Method to Predict Potential Effective Drug Combination , 2014, BioMed research international.

[52]  D. Lewis,et al.  Human cytochromes P450 associated with the phase 1 metabolism of drugs and other xenobiotics: a compilation of substrates and inhibitors of the CYP1, CYP2 and CYP3 families. , 2003, Current medicinal chemistry.

[53]  Yu Zong Chen,et al.  Prediction of Cytochrome P450 3A4, 2D6, and 2C9 Inhibitors and Substrates by Using Support Vector Machines , 2005, J. Chem. Inf. Model..

[54]  Tatiana Nikolskaya,et al.  Modeling of human cytochrome p450-mediated drug metabolism using unsupervised machine learning approach. , 2003, Journal of medicinal chemistry.

[55]  Vladimir Poroikov,et al.  SOMP: web server for in silico prediction of sites of metabolism for drug-like compounds , 2015, Bioinform..

[56]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[57]  Robert C. Glen,et al.  Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers , 2014, Journal of Cheminformatics.

[58]  Feixiong Cheng,et al.  In silico ADMET prediction: recent advances, current challenges and future trends. , 2013, Current topics in medicinal chemistry.

[59]  Gajendra PS Raghava,et al.  RESEARCH ARTICLE Open Access Research article Prediction of cytochrome P450 isoform responsible , 2022 .

[60]  Feng Liu,et al.  Predicting drug-disease associations by using similarity constrained matrix factorization , 2018, BMC Bioinformatics.

[61]  M. Cordeiro,et al.  Review of current chemoinformatic tools for modeling important aspects of CYPs-mediated drug metabolism. Integrating metabolism data with other biological profiles to enhance drug discovery. , 2014, Current drug metabolism.

[62]  Takayuki Ito,et al.  Novel Hierarchical Classification and Visualization Method for Multiobjective Optimization of Drug Properties: Application to Structure-Activity Relationship Analysis of Cytochrome P450 Metabolism , 2008, J. Chem. Inf. Model..

[63]  Daisuke Kihara,et al.  Three-Dimensional Compound Comparison Methods and Their Application in Drug Discovery , 2015, Molecules.

[64]  Yi Xiong,et al.  Protein-protein interface hot spots prediction based on a hybrid feature selection strategy , 2018, BMC Bioinformatics.

[65]  Yi Xiong,et al.  An accurate feature‐based method for identifying DNA‐binding residues on protein surfaces , 2011, Proteins.

[66]  Wei Chen,et al.  iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences , 2016, Oncotarget.

[67]  Dong-Qing Wei,et al.  Prediction of Type II Toxin-Antitoxin Loci in Klebsiella pneumoniae Genome Sequences , 2015, Interdisciplinary Sciences: Computational Life Sciences.

[68]  Magnus Ingelman-Sundberg,et al.  The Human Cytochrome P450 (CYP) Allele Nomenclature website: a peer-reviewed database of CYP variants and their associated effects , 2010, Human Genomics.

[69]  Meng Zhao,et al.  Prediction of conformational B-cell epitopes from 3D structures by random forests with a distance-based feature , 2011, BMC Bioinformatics.

[70]  Yi Xiong,et al.  Improved Prediction of Michaelis Constants in CYP450-Mediated Reactions by Resilient Back Propagation Algorithm. , 2016, Current drug metabolism.

[71]  M. Ramesh,et al.  CYP isoform specificity toward drug metabolism: analysis using common feature hypothesis , 2012, Journal of Molecular Modeling.

[72]  Yi Xiong,et al.  Improved feature-based prediction of SNPs in human cytochrome P450 enzymes , 2015, Interdisciplinary Sciences: Computational Life Sciences.

[73]  Tao Zeng,et al.  Prediction of heme binding residues from protein sequences with integrative sequence profiles , 2012, Proteome Science.

[74]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[75]  Doheon Lee,et al.  Prediction of compound-target interactions of natural products using large-scale drug and protein information , 2016, BMC Bioinformatics.

[76]  Wei Chen,et al.  iDNA4mC: identifying DNA N4‐methylcytosine sites based on nucleotide chemical properties , 2017, Bioinform..

[77]  Yovani Marrero-Ponce,et al.  Linear Indices of the "Molecular Pseudograph's Atom Adjacency Matrix": Definition, Significance-Interpretation, and Application to QSAR Analysis of Flavone Derivatives as HIV-1 Integrase Inhibitors , 2004, J. Chem. Inf. Model..