Exploring the Performance Limit of Cluster Ensemble Techniques

Cluster ensemble techniques are a means for boosting the clustering performance. However, many cluster ensemble methods are faced with high computational complexity. Indeed, the median partition methods are NP-complete. While a variety of approximative approaches for suboptimal solutions have been proposed in the literature, the performance evaluation is typically done by means of ground truth. In contrast this work explores the question how well the cluster ensemble methods perform in an absolute sense without ground truth, i.e. how they compare to the (unknown) optimal solution. We present a study of applying and extending a lower bound as an attempt to answer the question. In particular, we demonstrate the tightness of the lower bound, which indicates that there exists no more room for further improvement (for the particular data set at hand). The lower bound can thus be considered as a means of exploring the performance limit of cluster ensemble techniques.

[1]  Hui-lan Luo,et al.  Combining Multiple Clusterings using Information Theory based Genetic Algorithm , 2006, 2006 International Conference on Computational Intelligence and Security.

[2]  Ronald Fagin,et al.  Relaxing the Triangle Inequality in Pattern Matching , 2004, International Journal of Computer Vision.

[3]  Sandro Vega-Pons,et al.  Weighted Cluster Ensemble Using a Kernel Consensus Function , 2008, CIARP.

[4]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[5]  Edwin R. Hancock,et al.  Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshop, SSPR&SPR 2010, Cesme, Izmir, Turkey, August 18-20, 2010. Proceedings , 2010, SSPR/SPR.

[6]  Vladimir Filkov,et al.  Consensus Clustering Algorithms: Comparison and Refinement , 2008, ALENEX.

[7]  J. Heinonen Lectures on Analysis on Metric Spaces , 2000 .

[8]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Horst Bunke,et al.  On Median Graphs: Properties, Algorithms, and Applications , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Xiaoyi Jiang,et al.  Ensemble Clustering via Random Walker Consensus Strategy , 2010, 2010 20th International Conference on Pattern Recognition.

[11]  Horst Bunke,et al.  Optimal Lower Bound for Generalized Median Problems in Metric Space , 2002, SSPR/SPR.

[12]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[13]  Daniel P. Lopresti,et al.  Using Consensus Sequence Voting to Correct OCR Errors , 1997, Comput. Vis. Image Underst..

[14]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[15]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[16]  José Francisco Martínez-Trinidad,et al.  Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamericann Congress on Pattern Recognition, CIARP 2007, Valparaiso, Chile, November 13-16, 2007, Proceedings , 2008, CIARP.

[17]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[18]  Marcello Pelillo,et al.  What is a Cluster? Perspectives from Graph Theory , 2009, NIPS 2009.

[19]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[20]  S. Dongen Performance criteria for graph clustering and Markov cluster experiments , 2000 .

[21]  Jean-Pierre Barthélemy,et al.  The Median Procedure for Partitions , 1993, Partitioning Data Sets.