Chernoff Dimensionality Reduction-Where Fisher Meets FKT

Well known linear discriminant analysis (LDA) based on the Fisher criterion is incapable of dealing with heteroscedasticity in data. However, in many practical applications we often encounter heteroscedastic data, i.e., within-class scatter matrices can not be expected to be equal. A technique based on the Chernoff criterion for linear dimensionality reduction has been proposed recently. The technique extends well-known Fisher’s LDA and is capable of exploiting information about heteroscedasticity in the data. While the Chernoff criterion has been shown to outperform the Fisher’s, a clear understanding of its exact behavior is lacking. In addition, the criterion, as introduced, is rather complex, making it difficult to clearly state its relationship to other linear dimensionality reduction techniques. In this paper, we show precisely what can be expected from the Chernoff criterion and its relations to the Fisher criterion and Fukunaga-Koontz transform. Furthermore, we show that a recently proposed decomposition of the data space into four subspaces is incomplete. We provide arguments on how to best enrich the decomposition of the data space in order to account for heteroscedasticity in the data. Finally, we provide experimental results validating our theoretical analysis.

[1]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[2]  Luis Rueda,et al.  Linear dimensionality reduction by maximizing the Chernoff distance in the transformed space , 2008, Pattern Recognit..

[3]  Keinosuke Fukunaga,et al.  Application of the Karhunen-Loève Expansion to Feature Selection and Ordering , 1970, IEEE Trans. Computers.

[4]  Michael Elad,et al.  Optimal reduced-rank quadratic classifiers using the Fukunaga-Koontz transform with applications to automated target recognition , 2003, SPIE Defense + Commercial Sensing.

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  Anastasios Tefas,et al.  Weighted Piecewise LDA for Solving the Small Sample Size Problem in Face Verification , 2007, IEEE Transactions on Neural Networks.

[7]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  H. P. Decell,et al.  Feature combinations and the divergence criterion , 1977 .

[10]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[11]  Haesun Park,et al.  Generalizing discriminant analysis using the generalized singular value decomposition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Hanqing Lu,et al.  Solving the small sample size problem of LDA , 2002, Object recognition supported by user interaction for service robots.

[13]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[15]  C. H. Chen,et al.  On information and distance measures, error bounds, and feature selection , 1976, Information Sciences.

[16]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[17]  C. Loan Generalizing the Singular Value Decomposition , 1976 .

[18]  Xiangyang Xue,et al.  Optimal dimensionality of metric space for classification , 2007, ICML '07.

[19]  M. Saunders,et al.  Towards a Generalized Singular Value Decomposition , 1981 .

[20]  Timothy F. Cootes,et al.  Active Shape Models - 'smart snakes' , 1992, BMVC.

[21]  M. Loog Approximate Pairwise Accuracy Criteria for Multiclass Linear Dimension Reduction: Generalisations of the Fisher Criterion , 1999 .

[22]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Dapeng Wu,et al.  A RELIEF Based Feature Extraction Algorithm , 2008, SDM.

[24]  Dacheng Tao,et al.  Max-Min Distance Analysis by Using Sequential SDP Relaxation for Dimension Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Terence Sim,et al.  Discriminant Subspace Analysis: A Fukunaga-Koontz Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Aleix M. Martínez,et al.  Bayes Optimality in Linear Discriminant Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[29]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Rama Chellappa,et al.  Face recognition using discriminant eigenvectors , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[31]  Hiroshi Murase,et al.  Parametric Feature Detection , 1996, International Journal of Computer Vision.

[32]  Guna Seetharaman,et al.  Analysis of Chernoff criterion for linear dimensionality reduction , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[33]  Daoqiang Zhang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[34]  Jieping Ye,et al.  A two-stage linear discriminant analysis via QR-decomposition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  C. T. Ng,et al.  Measures of distance between probability distributions , 1989 .