Kernel Discriminant Analysis for Positive Definite and Indefinite Kernels

Kernel methods are a class of well established and successful algorithms for pattern analysis thanks to their mathematical elegance and good performance. Numerous nonlinear extensions of pattern recognition techniques have been proposed so far based on the so-called kernel trick. The objective of this paper is twofold. First, we derive an additional kernel tool that is still missing, namely kernel quadratic discriminant (KQD). We discuss different formulations of KQD based on the regularized kernel Mahalanobis distance in both complete and class-related subspaces. Secondly, we propose suitable extensions of kernel linear and quadratic discriminants to indefinite kernels. We provide classifiers that are applicable to kernels defined by any symmetric similarity measure. This is important in practice because problem-suited proximity measures often violate the requirement of positive definiteness. As in the traditional case, KQD can be advantageous for data with unequal class spreads in the kernel-induced spaces, which cannot be well separated by a linear discriminant. We illustrate this on artificial and real data for both positive definite and indefinite kernels.

[1]  Ali H. Sayed,et al.  Linear Estimation in Krein Spaces - Part I: Theory , 1996 .

[2]  J. Bognár,et al.  Indefinite Inner Product Spaces , 1974 .

[3]  Klaus-Robert Müller,et al.  Feature Discovery in Non-Metric Pairwise Data , 2004, J. Mach. Learn. Res..

[4]  Bernard Victorri,et al.  Transformation invariance in pattern recognition: Tangent distance and propagation , 2000 .

[5]  N. JARDINE,et al.  A New Approach to Pattern Recognition , 1971, Nature.

[6]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[7]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[8]  Hans Burkhardt,et al.  Invariant kernel functions for pattern analysis and machine learning , 2007, Machine Learning.

[9]  S. Canu,et al.  Functional learning through kernel , 2002 .

[10]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[11]  Anil K. Jain,et al.  A modified Hausdorff distance for object matching , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[12]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[13]  S. Canu,et al.  M L ] 6 O ct 2 00 9 Functional learning through kernel , 2009 .

[14]  James Rovnyak,et al.  Operators on indefinite inner product spaces , 1995 .

[15]  Daphna Weinshall,et al.  Classification with Nonmetric Distances: Image Retrieval and Class Representation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Ulrike von Luxburg,et al.  Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[17]  Bernard Haasdonk,et al.  Feature space interpretation of SVMs with indefinite kernels , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[19]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[20]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[21]  T. Kailath,et al.  Linear estimation in Krein spaces. I. Theory , 1996, IEEE Trans. Autom. Control..

[22]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[23]  Su-Yun Huang,et al.  Kernel Fisher ’ s Discriminant Analysis in Gaussian Reproducing Kernel , 2005 .

[24]  Xavier Mary Moore-Penrose Inverse in Kreĭn Spaces , 2008 .

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[26]  Bernard Haasdonk,et al.  Transformation knowledge in pattern analysis with kernel methods: distance and integration kernels , 2006 .

[28]  Jie Wang,et al.  Kernel quadratic discriminant analysis for small sample size problem , 2008, Pattern Recognit..

[29]  Elzbieta Pekalska,et al.  Indefinite Kernel Fisher Discriminant , 2008, 2008 19th International Conference on Pattern Recognition.

[30]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[31]  Bernhard Schölkopf,et al.  An improved training algorithm for kernel Fisher discriminants , 2001, AISTATS.

[32]  J. Rovnyak Methods of Kreĭn Space Operator Theory , 2002 .

[33]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[34]  Pedro E. López-de-Teruel,et al.  Nonlinear kernel-based statistical pattern analysis , 2001, IEEE Trans. Neural Networks.

[35]  Matthias Hein,et al.  Maximal Margin Classification for Metric Spaces , 2003, COLT.

[36]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[37]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[38]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[39]  Daniel Lee,et al.  Large-Margin Classification in Banach Spaces , 2007, AISTATS.

[40]  Horst Bunke,et al.  Non-Euclidean or Non-metric Measures Can Be Informative , 2006, SSPR/SPR.

[41]  Joachim M. Buhmann,et al.  Going Metric: Denoising Pairwise Data , 2002, NIPS.

[42]  Klaus Obermayer,et al.  Support Vector Machines for Dyadic Data , 2006, Neural Computation.

[43]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[44]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[45]  LuxburgUlrike von,et al.  Distance--Based Classification with Lipschitz Functions , 2004 .

[46]  R. Duin,et al.  The dissimilarity representation for pattern recognition , a tutorial , 2009 .