Learning a subspace for clustering via pattern shrinking

Clustering is a basic technique in information processing. Traditional clustering methods, however, are not suitable for high dimensional data. Thus, learning a subspace for clustering has emerged as an important research direction. Nevertheless, the meaningful data are often lying on a low dimensional manifold while existing subspace learning approaches cannot fully capture the nonlinear structures of hidden manifold. In this paper, we propose a novel subspace learning method that not only characterizes the linear and nonlinear structures of data, but also reflects the requirements of following clustering. Compared with other related approaches, the proposed method can derive a subspace that is more suitable for high dimensional data clustering. Promising experimental results on different kinds of data sets demonstrate the effectiveness of the proposed approach.

[1]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[2]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[3]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[4]  David J. Kriegman,et al.  Clustering appearances of objects under varying illumination conditions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  David G. Stork,et al.  Pattern Classification , 1973 .

[6]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[8]  Thomas Hofmann,et al.  Non-redundant data clustering , 2006, Knowledge and Information Systems.

[9]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[10]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[11]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[12]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[13]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[14]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[15]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[16]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[19]  Yong Shi,et al.  A shrinking-based clustering approach for multidimensional data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Quanquan Gu,et al.  Subspace maximum margin clustering , 2009, CIKM.

[21]  Tao Li,et al.  Document clustering via adaptive subspace iteration , 2004, SIGIR '04.

[22]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[23]  Edwin R. Hancock,et al.  Spanning Tree Recovery via Random Walks in a Riemannian Manifold , 2004, CIARP.

[24]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[25]  Frank Vogt,et al.  Numerical methods for accelerating the PCA of large data sets applied to hyperspectral imaging , 2002, SPIE Optics East.

[26]  Joachim M. Buhmann,et al.  Active Data Clustering , 1997, NIPS.

[27]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[28]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[29]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[30]  Jieping Ye,et al.  Discriminative K-means for Clustering , 2007, NIPS.