Improving classification models when a class hierarchy is available

Improving classification models when a class hierarchy is available Babak Shahbaba Doctor of Philosophy Graduate Department of Public Health Sciences University of Toronto 2007 We introduce a new method for modeling hierarchical classes, when we have prior knowledge of how these classes can be arranged in a hierarchy. The application of this approach is discussed for linear models, as well as nonlinear models based on Dirichlet process mixtures. Our method uses a Bayesian form of the multinomial logit (MNL) model, with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. Using simulated data, we compare the performance of the new method with the results from the ordinary MNL model, and a hierarchical model based on a set of nested MNL models. We find that when classes have a hierarchical structure, models that acknowledge such existing structure in data can perform better than a model that ignores such information (i.e., MNL). We also show that our model is more robust against missspecification of class structure compared to the alternative hierarchical model. Moreover, we test the new method on page layout analysis and document classification problems, and find that it performs better than the other methods. Our original motivation for conducting this research was classification of gene functions. Here, we investigate whether functional annotation of genes can be improved using the hierarchical structure of functional classes. We also introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. In this approach, we keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes

[1]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[2]  Amanda Clare,et al.  Predicting gene function in Saccharomyces cerevisiae , 2003, ECCB.

[3]  Kimmen Sjölander,et al.  Phylogenomic inference of protein molecular function: advances and challenges , 2004, Bioinform..

[4]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[5]  李幼升,et al.  Ph , 1989 .

[6]  S. MacEachern,et al.  A semiparametric Bayesian model for randomised block designs , 1996 .

[7]  A. P. Dawid,et al.  Regression and Classification Using Gaussian Process Priors , 2009 .

[8]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[9]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  M. Riley,et al.  Functions of the gene products of Escherichia coli , 1993, Microbiological reviews.

[12]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[13]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[14]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[15]  Duane Szafron,et al.  Improving Protein Function Prediction using the Hierarchical Structure of the Gene Ontology , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[16]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[17]  J. Guest,et al.  Adaptive responses to oxygen limitation in Escherichia coli. , 1991, Trends in biochemical sciences.

[18]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[19]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[20]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[21]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[22]  Yiming Yang,et al.  Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[23]  Joshua Goodman,et al.  Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[24]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[25]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[26]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[27]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[28]  Michael I. Jordan,et al.  Protein Molecular Function Prediction by Bayesian Phylogenomics , 2005, PLoS Comput. Biol..

[29]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[30]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[31]  Kevin Laven,et al.  A statistical learning approach to document image analysis , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[32]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[33]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[34]  Jeffrey Green,et al.  The FNR Modulon and FNR-Regulated Gene Expression , 1996 .

[35]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[36]  Babak Shahbaba,et al.  Gene function classification using Bayesian models with hierarchy-based priors , 2006, BMC Bioinformatics.

[37]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[38]  S. MacEachern,et al.  Bayesian Density Estimation and Inference Using Mixtures , 2007 .

[39]  Radford M. Neal,et al.  Improving Classification When a Class Hierarchy is Available Using a Hierarchy-Based Prior , 2005, math/0510449.

[40]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[41]  D. McFadden Econometric Models for Probabilistic Choice Among Products , 1980 .

[42]  Amanda Clare,et al.  Confirmation of data mining based predictions of protein function , 2004, Bioinform..

[43]  R. Kass,et al.  Nonconjugate Bayesian Estimation of Covariance Matrices and its Use in Hierarchical Models , 1999 .

[44]  A. Tversky,et al.  Additive similarity trees , 1977 .

[45]  D. Dunson,et al.  Bayesian Covariance Selection in Generalized Linear Mixed Models , 2006, Biometrics.

[46]  Saso Dzeroski,et al.  Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics , 2005, EPIA.

[47]  Radford M. Neal,et al.  Splitting and merging components of a nonconjugate Dirichlet process mixture model , 2007 .

[48]  P. Müller,et al.  Bayesian curve fitting using multivariate normal mixtures , 1996 .

[49]  Andreas S. Weigend,et al.  Exploiting Hierarchy in Text Categorization , 1999, Information Retrieval.

[50]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[51]  Ting Chen,et al.  An integrated probabilistic model for functional prediction of proteins , 2003, RECOMB '03.

[52]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[54]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[55]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[56]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[57]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[58]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[59]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[60]  Van Rijsbergen,et al.  Automatic information structuring and retrieval. , 1972 .

[61]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[62]  L. L. Lloyd,et al.  Enzyme nomenclature — Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology: Academic Press Ltd, London, UK, 1992. xiii + 862 pp. Price £40.00. ISBN 0-12-227165-3 , 1994 .

[63]  Kui Zhang,et al.  Prediction of protein function using protein-protein interaction data , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[64]  Amanda Clare,et al.  The utility of different representations of protein sequence for predicting functional class , 2001, Bioinform..

[65]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[66]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[67]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[68]  Janet M. Thornton,et al.  Comparison of functional annotation schemes for genomes , 2000, Functional & Integrative Genomics.

[69]  J. Fox Applied Regression Analysis, Linear Models, and Related Methods , 1997 .

[70]  Radford M. Neal The Short-Cut Metropolis Method , 2005, math/0508060.