Distributed K-Median Clustering with Application to Image Clustering

Developing algorithms suitable for distributed environments is important as data becomes more distributed. This paper proposes a distributed KMedian clustering algorithm for use in a distributed environment with centralized server, such as the Napster model in a peer-to-peer environment. Several approximate methods for computing the median in a distributed environment are proposed and analyzed in the context of the iterative KMedian algorithm. The proposed algorithm allows the clustering of multivariate data while ensuring that each cluster representative remains an item in the collection. This facilitates exploratory analysis where retaining a representative in the collection is important, such as imaging applications. Introduction and Background ● K-Means clustering is a well known and popular clustering technique. – Creates a new mean vector, which may not be meaningful in many applications ● Using the centroid of a cluster rather than the mean is one variation to the basic K-Means algorithm. – This is also known as the L1 Multivariate Median ● Dhillon and Modha first proposed a distributed KMeans clustering algorithm. – Computing the distributed median is more complicated At each Peer, P: For each image, x i , at P, Distributed K-Median Clustering Algorithm Select Initial Cluster Centers: Center(C G ) Calculate distance: from image, x i , to cluster center, Center(C G ) For each cluster, C Assign x i to cluster C, where Dist(x i ,Center(C G )) is minimized Select representatives to communicate, X P (C) Server communicates to peers For each cluster C: Calculate new center as an approximate Global Median: Median(C G ) = WMedian({X P (C)| P}) where X P (C) = {(x i ,w i )| x i is a representative image at peer P, w i is the number of items at P that x i represents}

[1]  Gudrun Fischer,et al.  Towards scatter/gather browsing in a hierarchical peer-to-peer network , 2005, P2PIR '05.

[2]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[3]  Zhirong Yang,et al.  Interactive Content-based Image Retrieval in the Peer-to-peer Network Using Self-Organizing Maps , 2022 .

[4]  Ruoming Jin,et al.  Fast and exact out-of-core and distributed k-means clustering , 2006, Knowledge and Information Systems.

[5]  Thomas Sikora,et al.  The MPEG-7 visual standard for content description-an overview , 2001, IEEE Trans. Circuits Syst. Video Technol..

[6]  Wolfgang Müller,et al.  Fast retrieval of high-dimensional feature vectors in P2P networks using compact peer data summaries , 2003, MIR '03.

[7]  Holly E. Rushmeier,et al.  A Scalable Parallel Algorithm for Self-Organizing Maps with Applications to Sparse Data Mining Problems , 1999, Data Mining and Knowledge Discovery.

[8]  Ignacio Blanquer,et al.  A P 2 P Platform for Sharing Radiological Images and Diagnoses , 2004 .

[9]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[10]  Wolfgang Müller,et al.  Scalable summary based retrieval in P2P networks , 2005, CIKM '05.

[11]  Irwin King,et al.  Distributed content-based visual information retrieval system on peer-to-peer networks , 2004, TOIS.

[12]  William M. Wells,et al.  Medical Image Computing and Computer-Assisted Intervention — MICCAI’98 , 1998, Lecture Notes in Computer Science.