Parallelizing K-Means Algorithm for 1-D Data Using MPI

Nowadays, colossal amount of information is produced by computational systems and electronic instruments such as telescopes, medical devices and so on. To explore these petabytes of data, new fast algorithms must be discovered or old ones may be redesigned. One of the most popular and useful techniques in order to discover and extract information from data pools is clustering, and k-means is an algorithm which clusters data according its characteristics. Its main disadvantage is its computational complexity which makes the technique very difficult to apply on big data sets. Although k-means is a very well studied technique, a fully parallel version of it has not been explored yet. In this work, a parallel version of the k-means is presented for 1-d objects. The experimental results obtained are inline with the theoretical outcome and prove both the correctness and the effectiveness of the technique.