K-Means Clustering Algorithms: Implementation and Comparison

The relationship among the large amount of biological data has become a hot research topic. It is desirable to have clustering methods to group similar data together so that, when a lot of data is needed, all data are easily found in close proximity to some search result. Here we study a popular method, k-means clustering, for data clustering. We implement two different k-means clustering algorithms and compare the results. The two algorithms are Lloyd's k-means clustering and the progressive greedy k-means clustering. Our experimentation compares the running times and distance efficiency.

[1]  Chase Cotton,et al.  Packet-level traffic measurements from the Sprint IP backbone , 2003, IEEE Netw..

[2]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[3]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Anja Feldmann,et al.  An analysis of Internet chat systems , 2003, IMC '03.

[5]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[7]  Philippe Owezarski,et al.  Modeling Internet backbone traffic at the flow level , 2003, IEEE Trans. Signal Process..

[8]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[9]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[10]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[11]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.