Performance Evaluation of Clustering Algorithms for Scalable Image Retrieval 1

In this paper we present scalable algorithms for image retrieval based on color. Our solution for scalability is to cluster the images in the database into groups of images with similar color content. At search time the query image is first compared with the pre-computed clusters, and only the closest set of clusters is further examined by comparing the query image to the images in that set. This obviates the need to compare the query image with every image in the database, thus making the search scalable to large databases. We have used the hierarchical clustering and the K-means clustering techniques. Performances of these two clustering algorithms are compared when three similarity measures, the histogram intersection measure, the L1, and the L2 measures, are used for image retrieval. The retrieval accuracy of the clustering algorithms is computed by comparing the results of retrieval with clustering against the results of retrieval without clustering. Our experiments with a database of 2000 color images show that both clustering techniques offer a retrieval accuracy of over 90% with only an average of 300 similarity comparisons (as opposed to 2000 comparisons that are required for retrieval without clustering). Our evaluations show that the hierarchical clustering algorithm outperforms the K-means clustering algorithm for all three similarity measures, although only marginally in some cases.