Enhanced K-Means Clustering Algorithm Using Red Black Tree and Min-Heap

  Abstract—Fast and high quality clustering is one of the most important tasks in the modern era of information processing wherein people rely heavily on search engines such as Google, Yahoo, and Bing etc. With the huge amount of available data and with an aim to creating better quality clusters, scores of algorithms having quality-complexity trade-offs have been proposed. However, the k-means algorithm proposed during late 1970's still enjoys a respectable position in the list of clustering algorithms. It is considered to be one of the most fundamental algorithms of data mining. It is basically an iterative algorithm. In each iteration, it requires finding the distance between each data object and centroid of each cluster. Considering the hugeness of modern databases, this task in itself snowballs into a tedious task. In this paper, we are proposing an improved version of k-means algorithm which offers to provide a remedy of the aforesaid problem. This algorithm employs two data structures viz. red-black tree and min-heap. These data structures are readily available in the modern programming languages. While red black tree is available in the form of map in C++ and TreeMap in Java, min-heap is available in the form of priority queue in the C++ standard template library. Thus implementation of our algorithm is as simple as that of the traditional algorithm. We have carried out extensive experiments. The results so obtained establish the superiority of our version of k-means algorithm over the traditional one.