论文信息 - Bayesian learning of sparse classifiers

Bayesian learning of sparse classifiers

Bayesian approaches to supervised learning use priors on the classifier parameters. However, few priors aim at achieving "sparse" classifiers, where irrelevant/redundant parameters are automatically set to zero. Two well-known ways of obtaining sparse classifiers are: use a zero-mean Laplacian prior on the parameters, and the "support vector machine" (SVM). Whether one uses a Laplacian prior or an SVM, one still needs to specify/estimate the parameters that control the degree of sparseness of the resulting classifiers. We propose a Bayesian approach to learning sparse classifiers which does not involve any parameters controlling the degree of sparseness. This is achieved by a hierarchical-Bayes interpretation of the Laplacian prior, followed by the adoption of a Jeffreys' non-informative hyper-prior Implementation is carried out by an EM algorithm. Experimental evaluation of the proposed method shows that it performs competitively with (often better than) the best classification techniques available.

Anil K. Jain | Mário A. T. Figueiredo

[1] G. Wahba,et al. A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[2] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[4] J. Stephen Judd,et al. Learning in neural networks , 1988, COLT '88.

[5] J. Berger. Statistical Decision Theory and Bayesian Analysis , 1988 .

[6] P. McCullagh,et al. Generalized Linear Models , 1992 .

[7] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.

[8] G. Wahba. Spline models for observational data , 1990 .

[9] S. Chib,et al. Bayesian analysis of binary and polychotomous response data , 1993 .

[10] K. Lange,et al. Normal/Independent Distributions and Their Applications in Robust Regression , 1993 .

[11] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..