A novel kernel-based maximum a posteriori classification method

Kernel methods have been widely used in pattern recognition. Many kernel classifiers such as Support Vector Machines (SVM) assume that data can be separated by a hyperplane in the kernel-induced feature space. These methods do not consider the data distribution and are difficult to output the probabilities or confidences for classification. This paper proposes a novel Kernel-based Maximum A Posteriori (KMAP) classification method, which makes a Gaussian distribution assumption instead of a linear separable assumption in the feature space. Robust methods are further proposed to estimate the probability densities, and the kernel trick is utilized to calculate our model. The model is theoretically and empirically important in the sense that: (1) it presents a more generalized classification model than other kernel-based algorithms, e.g., Kernel Fisher Discriminant Analysis (KFDA); (2) it can output probability or confidence for classification, therefore providing potential for reasoning under uncertainty; and (3) multi-way classification is as straightforward as binary classification in this model, because only probability calculation is involved and no one-against-one or one-against-others voting is needed. Moreover, we conduct an extensive experimental comparison with state-of-the-art classification methods, such as SVM and KFDA, on both eight UCI benchmark data sets and three face data sets. The results demonstrate that KMAP achieves very promising performance against other models.

[1]  Zenglin Xu,et al.  Efficient Convex Relaxation for Transductive Support Vector Machine , 2007, NIPS.

[2]  Jian Yang,et al.  KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Shuicheng Yan,et al.  Trace Quotient Problems Revisited , 2006, ECCV.

[4]  Patrick J. Flynn,et al.  Overview of the face recognition grand challenge , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Konstantinos N. Plataniotis,et al.  Face recognition using kernel direct discriminant analysis algorithms , 2003, IEEE Trans. Neural Networks.

[6]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[7]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[8]  J. Friedman Regularized Discriminant Analysis , 1989 .

[9]  Sébastien Marcel,et al.  Face Authentication Using Adapted Local Binary Pattern Histograms , 2006, ECCV.

[10]  B. K. Julsing,et al.  Face Recognition with Local Binary Patterns , 2012 .

[11]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[12]  Michael R. Lyu,et al.  Learning large margin classifiers locally and globally , 2004, ICML.

[13]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[14]  Michael R. Lyu,et al.  Face Annotation Using Transductive Kernel Fisher Discriminant , 2008, IEEE Transactions on Multimedia.

[15]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[16]  Changshui Zhang,et al.  Kernel Trick Embedded Gaussian Mixture Model , 2003, ALT.

[17]  J. Q. Smith Decision Analysis: A Bayesian Approach , 1988 .

[18]  Chengjun Liu,et al.  Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition , 2002, IEEE Trans. Image Process..

[19]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[20]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[21]  Anne Lohrli Chapman and Hall , 1985 .

[22]  Fumitaka Kimura,et al.  Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ming-Hsuan Yang,et al.  Kernel Eigenfaces vs. Kernel Fisherfaces: Face recognition using kernel methods , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[24]  Hyeonjoon Moon,et al.  The FERET Evaluation Methodology for Face-Recognition Algorithms , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[26]  Rama Chellappa,et al.  From sample similarity to ensemble similarity: probabilistic distance measures in reproducing kernel Hilbert space , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Gunnar Rätsch,et al.  Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[29]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[30]  David G. Stork,et al.  Pattern Classification , 1973 .

[31]  Michael R. Lyu,et al.  Maxi–Min Margin Machine: Learning Large Margin Classifiers Locally and Globally , 2008, IEEE Transactions on Neural Networks.

[32]  Lai-Wan Chan,et al.  The Minimum Error Minimax Probability Machine , 2004, J. Mach. Learn. Res..

[33]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[34]  Jianlin Wang,et al.  Solving the small sample size problem in face recognition using generalized discriminant analysis , 2006, Pattern Recognit..

[35]  Yee Whye Teh,et al.  Names and faces in the news , 2004, CVPR 2004.

[36]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[37]  Alexander J. Smola,et al.  Kernel methods and the exponential family , 2006, ESANN.

[38]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Su-Yun Huang,et al.  Kernel Fisher ’ s Discriminant Analysis in Gaussian Reproducing Kernel , 2005 .

[40]  Wenming Zheng,et al.  An efficient algorithm to solve the small sample size problem for LDA , 2004, Pattern Recognit..

[41]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[42]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[43]  Jie Wang,et al.  On solving the face recognition problem with one training sample per subject , 2006, Pattern Recognit..

[44]  Chengjun Liu,et al.  Capitalize on dimensionality increasing techniques for improving face recognition grand challenge performance , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Zenglin Xu,et al.  Maximum Margin based Semi-supervised Spectral Kernel Learning , 2007, 2007 International Joint Conference on Neural Networks.

[46]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[47]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[48]  Mayer Aladjem Nonparametric discriminant analysis via recursive optimization of Patrick-Fisher distance , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[49]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[50]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[51]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[52]  Chengjun Liu,et al.  Probabilistic reasoning models for face recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[53]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[54]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[55]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[56]  Pedro E. López-de-Teruel,et al.  Nonlinear kernel-based statistical pattern analysis , 2001, IEEE Trans. Neural Networks.

[57]  W. Waller,et al.  On the monotonicity of the performance of Bayesian classifiers (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[58]  Jian Huang,et al.  Kernel machine-based one-parameter regularized Fisher discriminant method for face recognition , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[59]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[60]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[61]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[62]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[63]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Zenglin Xu,et al.  Kernel Maximum a Posteriori Classification with Error Bound Analysis , 2007, ICONIP.

[65]  Pong C. Yuen,et al.  Regularized discriminant analysis and its application to face recognition , 2003, Pattern Recognit..

[66]  Pong C. Yuen,et al.  Face Recognition by Regularized Discriminant Analysis , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[67]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[68]  Jian Yang,et al.  Why can LDA be performed in PCA transformed space? , 2003, Pattern Recognit..