Geometric Mean for Subspace Selection

Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher's linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c - 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the Kullback-Leibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.

[1]  Robert P. W. Duin,et al.  Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[3]  Xuelong Li,et al.  Supervised tensor learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[4]  C. R. Rao,et al.  The Utilization of Multiple Measurements in Problems of Biological Classification , 1948 .

[5]  C. Schmid,et al.  High-Dimensional Discriminant Analysis , 2005 .

[6]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[8]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ravi Kothari,et al.  Fractional-Step Dimensionality Reduction , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Samuel Kaski,et al.  Discriminative components of data , 2005, IEEE Transactions on Neural Networks.

[11]  Jieping Ye,et al.  An optimization criterion for generalized discriminant analysis on undersampled problems , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[15]  Juyang Weng,et al.  Hierarchical Discriminant Analysis for Image Retrieval , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  David G. Stork,et al.  Pattern Classification , 1973 .

[17]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..

[18]  Arnold Schönhage,et al.  Fast algorithms - a multitape Turing machine implementation , 1994 .

[19]  Takeo Kanade,et al.  Multimodal oriented discriminant analysis , 2005, ICML.

[20]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[21]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Xuelong Li,et al.  General Averaged Divergence Analysis , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[23]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[24]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[25]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[26]  N. Campbell CANONICAL VARIATE ANALYSIS—A GENERAL MODEL FORMULATION , 1984 .

[27]  Konstantinos N. Plataniotis,et al.  Face recognition using LDA-based algorithms , 2003, IEEE Trans. Neural Networks.

[28]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[29]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[30]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[31]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[32]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[33]  Jieping Ye,et al.  A two-stage linear discriminant analysis via QR-decomposition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  M. Loog Approximate Pairwise Accuracy Criteria for Multiclass Linear Dimension Reduction: Generalisations of the Fisher Criterion , 1999 .

[35]  G. Celeux,et al.  Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition , 1996 .

[36]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[37]  Trevor Hastie,et al.  Feature Extraction for Nonparametric Discriminant Analysis , 2003 .

[38]  H. P. Decell,et al.  Feature combinations and the divergence criterion , 1977 .

[39]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[40]  Hans C. van Houwelingen,et al.  The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[41]  Josef Kittler,et al.  Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[43]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[44]  J. Friedman Regularized Discriminant Analysis , 1989 .

[45]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[46]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Xuelong Li,et al.  General Tensor Discriminant Analysis and Gabor Features for Gait Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.