Classification and its application to drug-target interaction prediction

Classification is one of the most popular and widely used supervised learning tasks, which categorizes objects into predefined classes based on known knowledge. Classification has been an important research topic in machine learning and data mining. Different classification methods have been proposed and applied to deal with various real-world problems. Unlike unsupervised learning such as clustering, a classifier is typically trained with labeled data before being used to make prediction, and usually achieves higher accuracy than unsupervised one. In this chapter, we first define classification and then review several representative methods. After that, we study in details the application of classification to a critical problem in drug discovery, i.e., drug-target prediction, due to the challenges in predicting possible interactions between drugs and targets.

[1]  M. Kanehisa,et al.  Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. , 2003, Journal of the American Chemical Society.

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3]  Chee Keong Kwoh,et al.  Positive-unlabeled learning for disease gene identification , 2012, Bioinform..

[4]  Helen M Berman,et al.  Statistical models for discerning protein structures containing the DNA-binding helix-turn-helix motif. , 2003, Journal of molecular biology.

[5]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[6]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[7]  Steven Salzberg,et al.  Locating Protein Coding Regions in Human DNA Using a Decision Tree Algorithm , 1995, J. Comput. Biol..

[8]  Michael Schroeder,et al.  Old friends in new guise: repositioning of known drugs with structural bioinformatics , 2011, Briefings Bioinform..

[9]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[10]  Yoshihiro Yamanishi,et al.  Supervised prediction of drug–target interactions using bipartite local models , 2009, Bioinform..

[11]  Ramana V. Davuluri,et al.  Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data , 2010, BMC Bioinformatics.

[12]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[13]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[14]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[15]  Jianxiang Mei,et al.  Globalized Bipartite Local Learning Model for Drug-Target Interaction Prediction , 2012 .

[16]  Shuliang Wang,et al.  Data Mining and Knowledge Discovery , 2005, Mathematical Principles of the Internet.

[17]  P. Bork,et al.  Drug Target Identification Using Side-Effect Similarity , 2008, Science.

[18]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[20]  Xiaoli Li,et al.  Ensemble Positive Unlabeled Learning for Disease Gene Identification , 2014, PloS one.

[21]  Roded Sharan,et al.  Combining Drug and Gene Similarity Measures for Drug-Target Elucidation , 2011, J. Comput. Biol..

[22]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[23]  Xiaobo Zhou,et al.  Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces , 2010, BMC Systems Biology.

[24]  M. Murcko,et al.  Chemogenomic approaches to drug discovery. , 2001, Current opinion in chemical biology.

[25]  Robert B. Russell,et al.  SuperTarget and Matador: resources for exploring drug-target relationships , 2007, Nucleic Acids Res..

[26]  Xiang Chen,et al.  The use of classification trees for bioinformatics , 2011, WIREs Data Mining Knowl. Discov..

[27]  Xiaoli Li,et al.  Inferring Gene-Phenotype Associations via Global Protein Complex Network Propagation , 2011, PloS one.

[28]  Charles Elkan,et al.  Learning gene regulatory networks from only positive and unlabeled data , 2010, BMC Bioinformatics.

[29]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Chee Keong Kwoh,et al.  Drug-target interaction prediction by learning from local information and neighbors , 2013, Bioinform..

[31]  D. Rognan Chemogenomic approaches to rational drug design , 2007, British journal of pharmacology.

[32]  Xing Chen,et al.  Drug-target interaction prediction by random walk on the heterogeneous network. , 2012, Molecular bioSystems.

[33]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[34]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[35]  Jean-Philippe Vert,et al.  Protein-ligand interaction prediction: an improved chemogenomics approach , 2008, Bioinform..

[36]  Elena Marchiori,et al.  Gaussian interaction profile kernels for predicting drug-target interaction , 2011, Bioinform..

[37]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[38]  Antje Chang,et al.  BRENDA , the enzyme database : updates and major new developments , 2003 .

[39]  Jean-Philippe Vert,et al.  SIRENE: supervised inference of regulatory networks , 2008, ECCB.

[40]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[41]  H Kubinyi,et al.  Chemogenomics in drug discovery. , 2006, Ernst Schering Research Foundation workshop.

[42]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.