Improved Estimation of Entropy for Evaluation of Word Sense Induction

Information-theoretic measures are among the most standard techniques for evaluation of clustering methods including word sense induction (WSI) systems. Such measures rely on sample-based estimates of the entropy. However, the standard maximum likelihood estimates of the entropy are heavily biased with the bias dependent on, among other things, the number of clusters and the sample size. This makes the measures unreliable and unfair when the number of clusters produced by different systems vary and the sample size is not exceedingly large. This corresponds exactly to the setting of WSI evaluation where a ground-truth cluster sense number arguably does not exist and the standard evaluation scenarios use a small number of instances of each word to compute the score. We describe more accurate entropy estimators and analyze their performance both in simulations and on evaluation of WSI systems.

[1]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[2]  Byron Dom,et al.  An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.

[3]  Eneko Agirre,et al.  Evaluating and optimizing the parameters of an unsupervised graph-based WSD algorithm , 2006 .

[4]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[5]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[6]  Xiaoqiang Luo,et al.  On Coreference Resolution Performance Metrics , 2005, HLT.

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[8]  Adam Kilgarriff,et al.  How Dominant Is the Commonest Sense of a Word? , 2004, TSD.

[9]  Ronitt Rubinfeld,et al.  The complexity of approximating the entropy , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[10]  Ga Miller,et al.  Note on the bias of information estimates , 1955 .

[11]  M. H. Quenouille NOTES ON BIAS IN ESTIMATION , 1956 .

[12]  Peter Grassberger,et al.  Entropy estimation of symbol sequences. , 1996, Chaos.

[13]  William Bialek,et al.  Entropy and Information in Neural Spike Trains , 1996, cond-mat/9603127.

[14]  J. Steele An Efron-Stein inequality for nonsymmetric statistics , 1986 .

[15]  S. M. Samuels On the Number of Successes in Independent Trials , 1965 .

[16]  Suresh Manandhar,et al.  Evaluating Word Sense Induction and Disambiguation Methods , 2013, Lang. Resour. Evaluation.

[17]  Liam Paninski,et al.  Estimating entropy on m bins given fewer than m samples , 2004, IEEE Transactions on Information Theory.

[18]  Hinrich Sch Automatic Word Sense Discrimination , 1998 .

[19]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[20]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[21]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[22]  Eneko Agirre,et al.  Semeval-2007 Task 2 : Evaluating Word Sense Induction and Discrimination , 2007 .

[23]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[24]  Suresh Manandhar,et al.  SemEval-2010 Task 14: Evaluation Setting for Word Sense Induction & Disambiguation Systems , 2009, SEW@NAACL-HLT.

[25]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[26]  Ronitt Rubinfeld,et al.  The complexity of approximating entropy , 2002, STOC '02.

[27]  Suresh Manandhar,et al.  SemEval-2010 Task 14: Word Sense Induction &Disambiguation , 2010, SemEval@ACL.

[28]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[29]  Ian Witten,et al.  Data Mining , 2000 .

[30]  A. Antos,et al.  Convergence properties of functional estimates for discrete distributions , 2001 .

[31]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.