Selection of generative models in classification

This paper is concerned with the selection of a generative model for supervised classification. Classical criteria for model selection assess the fit of a model rather than its ability to produce a low classification error rate. A new criterion, the Bayesian entropy criterion (BEC), is proposed. This criterion takes into account the decisional purpose of a model by minimizing the integrated classification entropy. It provides an interesting alternative to the cross-validated error rate which is computationally expensive. The asymptotic behavior of the BEC criterion is presented. Numerical experiments on both simulated and real data sets show that BEC performs better than the BIC criterion to select a model minimizing the classification error rate and provides analogous performance to the cross-validated error rate.

[1]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[2]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[3]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[4]  Henry Tirri,et al.  Classifier Learning with Supervised Marginal Likelihood , 2001, UAI.

[5]  Pedro M. Domingos,et al.  Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[6]  D. W. McMichael,et al.  Objective functions for maximum likelihood classifier design , 1999, 1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No.99EX251).

[7]  J. Friedman Regularized Discriminant Analysis , 1989 .

[8]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[9]  Purushottam W. Laud,et al.  Predictive Model Selection , 1995 .

[10]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[11]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[12]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[13]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[14]  Edward I. George,et al.  Bayesian Model Selection , 2006 .

[15]  G. Celeux,et al.  Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition , 1996 .

[16]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[17]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[18]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[19]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[20]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[21]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[22]  Cordelia Schmid,et al.  Selection of scale-invariant parts for object class recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[24]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[25]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[26]  L. Wasserman,et al.  Practical Bayesian Density Estimation Using Mixtures of Normals , 1997 .

[27]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[28]  Peter Auer,et al.  Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[29]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Guillaume Bouchard,et al.  Hierarchical part-based visual object categorization , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Guillaume Bouchard,et al.  Supervised Classification with Spherical Gaussian Mixtures , 2003 .

[32]  P. Deb Finite Mixture Models , 2008 .

[33]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[34]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[35]  Sumio Watanabe,et al.  Singularities in mixture models and upper bounds of stochastic complexity , 2003, Neural Networks.

[36]  H. Akaike A new look at the statistical model identification , 1974 .

[37]  S. Geisser,et al.  A Predictive Approach to Model Selection , 1979 .

[38]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[39]  Alex Pentland,et al.  Discriminative, generative and imitative learning , 2002 .