Discriminative cluster analysis

Clustering is one of the most widely used statistical tools for data analysis. Among all existing clustering techniques, k-means is a very popular method because of its ease of programming and because it accomplishes a good trade-off between achieved performance and computational complexity. However, k-means is prone to local minima problems, and it does not scale too well with high dimensional data sets. A common approach to dealing with high dimensional data is to cluster in the space spanned by the principal components (PC). In this paper, we show the benefits of clustering in a low dimensional discriminative space rather than in the PC space (generative). In particular, we propose a new clustering algorithm called Discriminative Cluster Analysis (DCA). DCA jointly performs dimensionality reduction and clustering. Several toy and real examples show the benefits of DCA versus traditional PCA+k-means clustering. Additionally, a new matrix formulation is proposed and connections with related techniques such as spectral graph methods and linear discriminant analysis are provided.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[3]  R. Fletcher Practical Methods of Optimization , 1988 .

[4]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[5]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[6]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[7]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[8]  David G. Lowe,et al.  Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  P. GALLINARI,et al.  On the relations between discriminant analysis and multilayer perceptrons , 1991, Neural Networks.

[10]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[11]  Takeo Kanade,et al.  A multi-body factorization method for motion analysis , 1995, Proceedings of IEEE International Conference on Computer Vision.

[12]  Alexander J. Smola,et al.  Neural Information Processing Systems , 1997, NIPS 1997.

[13]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[16]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[17]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[18]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[19]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[20]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[21]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[22]  Jieping Ye,et al.  Generalized Low Rank Approximations of Matrices , 2004, Machine Learning.

[23]  Amnon Shashua,et al.  A unifying approach to hard and probabilistic clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[24]  B. V. K. Vijaya Kumar,et al.  Representational oriented component analysis (ROCA) for face recognition with one sample image per training class , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[26]  Takeo Kanade,et al.  Multimodal oriented discriminant analysis , 2005, ICML.

[27]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[28]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[29]  C. Ding,et al.  Adaptive dimension reduction using discriminant analysis and K-means clustering , 2007, ICML '07.

[30]  Terence Sim,et al.  Discriminant Subspace Analysis: A Fukunaga-Koontz Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Jieping Ye,et al.  Discriminative K-means for Clustering , 2007, NIPS.