Clustering with Local and Global Regularization

Clustering is an old research topic in data mining and machine learning. Most of the traditional clustering methods can be categorized as local or global ones. In this paper, a novel clustering method that can explore both the local and global information in the data set is proposed. The method, Clustering with Local and Global Regularization (CLGR), aims to minimize a cost function that properly trades off the local and global costs. We show that such an optimization problem can be solved by the eigenvalue decomposition of a sparse symmetric matrix, which can be done efficiently using iterative methods. Finally, the experimental results on several data sets are presented to show the effectiveness of our method.

[1]  Xiaojin Zhu,et al.  Kernel Regression with Order Preferences , 2007, AAAI.

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[4]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[5]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[6]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[7]  Mark Herbster,et al.  Combining Graph Laplacians for Semi-Supervised Learning , 2005, NIPS.

[8]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Mikhail Belkin,et al.  Towards a theoretical foundation for Laplacian-based manifold methods , 2005, J. Comput. Syst. Sci..

[12]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[13]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[14]  Bernhard Schölkopf,et al.  A Local Learning Approach for Clustering , 2006, NIPS.

[15]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[17]  Bernhard Schölkopf,et al.  Transductive Classification via Local Learning Regularization , 2007, AISTATS.

[18]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[19]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[20]  Zoubin Ghahramani,et al.  Semi-supervised learning : from Gaussian fields to Gaussian processes , 2003 .

[21]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[22]  Fei Wang,et al.  Regularized clustering for documents , 2007, SIGIR.

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[25]  Martine D. F. Schlag,et al.  Spectral K-Way Ratio-Cut Partitioning and Clustering , 1993, 30th ACM/IEEE Design Automation Conference.

[26]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[27]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[28]  Alexander Zien,et al.  A continuation method for semi-supervised SVMs , 2006, ICML.

[29]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[30]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[31]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[32]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[33]  Ulrike von Luxburg,et al.  From Graphs to Manifolds - Weak and Strong Pointwise Consistency of Graph Laplacians , 2005, COLT.

[34]  Man Lan,et al.  Initialization of cluster refinement algorithms: a review and comparative study , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[35]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[36]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[37]  Graham J. Williams,et al.  Data Mining , 2000, Communications in Computer and Information Science.

[38]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[39]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[40]  Bernhard Schölkopf,et al.  Learning from Labeled and Unlabeled Data Using Random Walks , 2004, DAGM-Symposium.

[41]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[42]  David G. Stork,et al.  Pattern Classification , 1973 .

[43]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[44]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[45]  Dit-Yan Yeung,et al.  Kernel selection forl semi-supervised kernel machines , 2007, ICML '07.

[46]  Jieping Ye,et al.  Nonlinear adaptive distance metric learning for clustering , 2007, KDD '07.