Bayesian learning of sparse classifiers

Bayesian approaches to supervised learning use priors on the classifier parameters. However, few priors aim at achieving "sparse" classifiers, where irrelevant/redundant parameters are automatically set to zero. Two well-known ways of obtaining sparse classifiers are: use a zero-mean Laplacian prior on the parameters, and the "support vector machine" (SVM). Whether one uses a Laplacian prior or an SVM, one still needs to specify/estimate the parameters that control the degree of sparseness of the resulting classifiers. We propose a Bayesian approach to learning sparse classifiers which does not involve any parameters controlling the degree of sparseness. This is achieved by a hierarchical-Bayes interpretation of the Laplacian prior, followed by the adoption of a Jeffreys' non-informative hyper-prior Implementation is carried out by an EM algorithm. Experimental evaluation of the proposed method shows that it performs competitively with (often better than) the best classification techniques available.

[1]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  J. Stephen Judd,et al.  Learning in neural networks , 1988, COLT '88.

[5]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[6]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[7]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[8]  G. Wahba Spline models for observational data , 1990 .

[9]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[10]  K. Lange,et al.  Normal/Independent Distributions and Their Applications in Robust Regression , 1993 .

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[13]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[14]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[15]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[18]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[21]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[22]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Matthias W. Seeger,et al.  Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers , 1999, NIPS.

[24]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[25]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[26]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[28]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[29]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[30]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[31]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[32]  Robert D. Nowak,et al.  Wavelet-based image estimation: an empirical Bayes approach using Jeffrey's noninformative prior , 2001, IEEE Trans. Image Process..

[33]  Mário A. T. Figueiredo,et al.  Wavelet-Based Image Estimation : An Empirical Bayes Approach Using Jeffreys ’ Noninformative Prior , 2001 .

[34]  Eric R. Ziegel,et al.  Multivariate Statistical Modelling Based on Generalized Linear Models , 2002, Technometrics.