Parallel K-Means Algorithm on Distributed Memory Multiprocessors

Clustering large data sets can be time consuming and processor intensive. This project is an implementation of the parallel version of a popular clustering algorithm, the k-means algorithm, to provide faster clustering solutions. This algorithm was tested such that 3,4,5,7 clusters were created on a cluster of Sun workstations. Optimal levels of speedup were not achieved; but the benefits of parallelization were observed. This methodology exploits the inherent dataparallelism in the k-means algorithm and makes use of the message-passing model.

[1]  Michael K. Ng K-Means-Type Algorithms on Distributed Memory Computer , 2000, Int. J. High Speed Comput..