论文信息 - Two-level k-means clustering algorithm for k-tau relationship establishment and linear-time classification

Two-level k-means clustering algorithm for k-tau relationship establishment and linear-time classification

Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.

M. Narasimha Murty | Radha Chitta

[1] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[2] Sudipto Guha,et al. Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[3] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[4] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[5] Jiawei Han,et al. Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.

[6] B. Schölkopf,et al. Advances in kernel methods: support vector learning , 1999 .

[7] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[8] J. Platt. Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[9] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[10] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[11] Ivor W. Tsang,et al. Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[12] Hui Xiong,et al. K-means clustering versus validation measures: a data distribution perspective , 2006, KDD '06.

[13] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[14] D.M. Mount,et al. An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15] Sargur N. Srihari,et al. Fast k-nearest neighbor classification using cluster-based trees , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.