Weighted mean of a pair of clusterings

In this paper, we introduce the weighted mean of a pair of clusterings. Given two clusterings C1 and C2, the weighted mean of C1 and C2 is a clustering Cw that has distances d(C1, Cw) and d(Cw, C2) to C1 and C2, respectively, such that d(C1, Cw) + d(Cw, C2) = d(C1, C2) holds for some clustering distance function d. Cw is defined such that the sum of the distances d(C1, Cw) and d(Cw, C2) is equal to the distance between C1 and C2. An algorithm for its computation will be presented. Experimental results on both synthetic and real data will be shown to illustrate the usefulness of the weighted mean concept. In particular, it gives a tool for the cluster ensemble techniques.

[1]  Horst Bunke,et al.  Towards Bridging the Gap between Statistical and Structural Pattern Recognition: Two New Concepts in Graph Matching , 2001, ICAPR.

[2]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[5]  Zhang Xiong,et al.  Incremental Clustering Using Information Bottleneck Theory , 2011, Int. J. Pattern Recognit. Artif. Intell..

[6]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[7]  S. vanDongen Performance criteria for graph clustering and Markov cluster experiments , 2000 .

[8]  Horst Bunke,et al.  On Median Graphs: Properties, Algorithms, and Applications , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Xiaofeng Wang,et al.  A Novel Density-Based Clustering Framework by Using Level Set Method , 2009, IEEE Transactions on Knowledge and Data Engineering.

[10]  Joydeep Ghosh,et al.  Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Matemática,et al.  Society for Industrial and Applied Mathematics , 2010 .

[12]  Horst Bunke,et al.  Weighted Mean of a Pair of Graphs , 2001, Computing.

[13]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[14]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[15]  Horst Bunke,et al.  Learning by generalized median concept , 2010 .

[16]  Cheng-Chien Kuo,et al.  A Novel Validity Index for the Subtractive Clustering Algorithm , 2011, Int. J. Pattern Recognit. Artif. Intell..

[17]  Mauro Dell'Amico,et al.  Assignment Problems , 1998, IFIP Congress: Fundamentals - Foundations of Computer Science.

[18]  Dmitry A. Konovalov,et al.  Partition-distance via the assignment problem , 2005, Bioinform..

[19]  Marina Meila,et al.  An Experimental Comparison of Model-Based Clustering Methods , 2004, Machine Learning.

[20]  Hongyuan Zha,et al.  A new Mallows distance based metric for comparing clusterings , 2005, ICML '05.

[21]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[22]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[23]  Abraham Kandel,et al.  On the Weighted Mean of a Pair of Strings , 2002, Pattern Analysis & Applications.

[24]  Dan Gusfield,et al.  Partition-distance: A problem and class of perfect graphs arising in clustering , 2002, Inf. Process. Lett..

[25]  Mauro Dell'Amico,et al.  8. Quadratic Assignment Problems: Algorithms , 2009 .

[26]  C. Field,et al.  Estimation of Single-Generation Sibling Relationships Based on DNA Markers , 1999 .

[27]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[28]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[29]  Joydeep Ghosh,et al.  Hierarchical Density Shaving: A clustering and visualization framework for large biological datasets , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[30]  Xiaoyi Jiang,et al.  Generalized median string computation by means of string embedding in vector spaces , 2012, Pattern Recognit. Lett..

[31]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[32]  Abraham Kandel,et al.  Curve morphing by weighted mean of strings , 2002, Object recognition supported by user interaction for service robots.

[33]  Tommy W. S. Chow,et al.  A new shifting grid clustering algorithm , 2004, Pattern Recognit..

[34]  Ernest Valveny,et al.  Generalized median graph computation by means of graph embedding in vector spaces , 2010, Pattern Recognit..

[35]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[36]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[37]  János Csirik,et al.  Dynamic computation of generalised median strings , 2002, Pattern Analysis & Applications.