Unsupervised Learning: Self-aggregation in Scaled Principal Component Space

We demonstrate that data clustering amounts to a dynamic process of self-aggregation in which data objects move towards each other to form clusters, revealing the inherent pattern of similarity. Self-aggregation is governed by connectivity and occurs in a space obtained by a nonlinear scaling of principal component analysis (PCA). The method combines dimensionality reduction with clustering into a single framework. It can apply to both square similarity matrices and rectangular association matrices.

[1]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  A. Hoffman,et al.  Lower bounds for the partitioning of graphs , 1973 .

[4]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[5]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[6]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[7]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[8]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[9]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[12]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[13]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[14]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[15]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[16]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[17]  Johan Himberg,et al.  A SOM based cluster visualization and its application for false coloring , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[18]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[19]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[20]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[21]  Chris H. Q. Ding,et al.  A spectral method to separate disconnected and nearly-disconnected web graph components , 2001, KDD '01.

[22]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[23]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[24]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[25]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.