A comparison of extrinsic clustering evaluation metrics based on formal constraints

There is a wide set of evaluation metrics available to compare the quality of text clustering algorithms. In this article, we define a few intuitive formal constraints on such metrics which shed light on which aspects of the quality of a clustering are captured by different metric families. These formal constraints are validated in an experiment involving human assessments, and compared with other constraints proposed in the literature. Our analysis of a wide range of metrics shows that only BCubed satisfies all formal constraints. We also extend the analysis to the problem of overlapping clustering, where items can simultaneously belong to more than one cluster. As Bcubed cannot be directly applied to this task, we propose a modified version of Bcubed that avoids the problems found with other metrics.

[1]  William M. Shaw,et al.  On the foundation of evaluation , 1986, J. Am. Soc. Inf. Sci..

[2]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[3]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[4]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[5]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[6]  Mohamed S. Kamel,et al.  A SOM-based document clustering using phrases , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[7]  Marina MeWi Comparing Clusterings , 2002 .

[8]  Byron Dom,et al.  An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.

[9]  Patrick Pantel,et al.  Efficiently Clustering Documents with Committees , 2002, PRICAI.

[10]  Joydeep Ghosh,et al.  Relationship-based clustering and cluster ensembles for high-dimensional data mining , 2002 .

[11]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[12]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[13]  Carol Peters,et al.  The impact of evaluation on multilingual text retrieval , 2005, SIGIR '05.

[14]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[15]  S. Sekine,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, *SEMEVAL.

[16]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).