A novel Bayesian logistic discriminant model: An application to face recognition

The linear discriminant analysis (LDA) is a linear classifier which has proven to be powerful and competitive compared to the main state-of-the-art classifiers. However, the LDA algorithm assumes the sample vectors of each class are generated from underlying multivariate normal distributions of common covariance matrix with different means (i.e., homoscedastic data). This assumption has restricted the use of LDA considerably. Over the years, authors have defined several extensions to the basic formulation of LDA. One such method is the heteroscedastic LDA (HLDA) which is proposed to address the heteroscedasticity problem. Another method is the nonparametric DA (NDA) where the normality assumption is relaxed. In this paper, we propose a novel Bayesian logistic discriminant (BLD) model which can address both normality and heteroscedasticity problems. The normality assumption is relaxed by approximating the underlying distribution of each class with a mixture of Gaussians. Hence, the proposed BLD provides more flexibility and better classification performances than the LDA, HLDA and NDA. A subclass and multinomial versions of the BLD are proposed. The posterior distribution of the BLD model is elegantly approximated by a tractable Gaussian form using variational transformation and Jensen's inequality, allowing a straightforward computation of the weights. An extensive comparison of the BLD to the LDA, support vector machine (SVM), HLDA, NDA and subclass discriminant analysis (SDA), performed on artificial and real data sets, has shown the advantages and superiority of our proposed method. In particular, the experiments on face recognition have clearly shown a significant improvement of the proposed BLD over the LDA.

[1]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[2]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[3]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[4]  Jieping Ye,et al.  Efficient Kernel Discriminant Analysis via QR Decomposition , 2004, NIPS.

[5]  Aleix M. Martínez,et al.  Subclass discriminant analysis , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[7]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[8]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[9]  Peter W. Hallinan,et al.  A deformable model for the recognition of human faces under arbitrary illumination , 1995 .

[10]  Bernard Colin,et al.  Weighted Pseudometric Discriminatory Power Improvement Using a Bayesian Logistic Regression Model Based on a Variational Method , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[14]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[15]  Haesun Park,et al.  Nonlinear Discriminant Analysis Using Kernel Functions and the Generalized Singular Value Decomposition , 2005, SIAM J. Matrix Anal. Appl..

[16]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[17]  Richard Gerlach,et al.  Theory & Methods: Bayesian variable selection in logistic regression: predicting company earnings direction , 2002 .

[18]  William D. Penny,et al.  Bayesian Approaches to Gaussian Mixture Modeling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[23]  Peter W. Hallinan A low-dimensional representation of human faces for arbitrary lighting conditions , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[25]  Haojun Sun,et al.  Measuring Overlap-Rate for Cluster Merging in a Hierarchical Approach to Color Image Segmentation , 2004 .

[26]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[27]  Richard Gerlach,et al.  A Bayesian Approach to Variable Selection in Logistic Regression with Application to Predicting Earnings Direction from Accounting Information , 2000 .

[28]  Man Lung Yiu,et al.  Iterative projected clustering by subspace mining , 2005, IEEE Transactions on Knowledge and Data Engineering.

[29]  Rama Chellappa,et al.  Empirical performance analysis of linear discriminant classifiers , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[30]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[31]  J. Weng Cresceptron and Shoslif: toward Comprehensive Visual Learning 1 , 1996 .

[32]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[33]  Aleix M. Martínez,et al.  Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification , 2007, J. Mach. Learn. Res..

[34]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[35]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .