A New Psychometric-inspired Evaluation Metric for Chinese Word Segmentation

Word segmentation is a fundamental task for Chinese language processing. However, with the successive improvements, the standard metric is becoming hard to distinguish state-of-the-art word segmentation systems. In this paper, we propose a new psychometric-inspired evaluation metric for Chinese word segmentation, which addresses to balance the very skewed word distribution at different levels of difficulty 1 . The performance on a real evaluation shows that the proposed metric gives more reasonable and distinguishable scores and correlates well with human judgement. In addition, the proposed metric can be easily extended to evaluate other sequence labelling based NLP tasks.

[1]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[2]  R. Freedle,et al.  The prediction of TOEFL reading item difficulty: implications for construct validity , 1993 .

[3]  Karen Spärck Jones Towards Better NLP System Evaluation , 1994, HLT.

[4]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[5]  Po-ching Yip,et al.  The Chinese Lexicon : A Comprehensive Survey , 2000 .

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[8]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Nianwen Xu,et al.  Chinese Word Segmentation as Character Tagging , 2003, Int. J. Comput. Linguistics Chin. Lang. Process..

[11]  I. Kostin Exploring Item Characteristics That Are Related to the Difficulty of TOEFL Dialogue Items. Research Reports. RR-79. RR-04-11. , 2004 .

[12]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[13]  Thomas Emerson,et al.  The Second International Chinese Word Segmentation Bakeoff , 2005, IJCNLP.

[14]  Gina-Anne Levow,et al.  The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[15]  Xiao Chen,et al.  The Fourth International Chinese Language Processing Bakeoff: Chinese Word Segmentation, Named Entity Recognition and Chinese POS Tagging , 2008, IJCNLP.

[16]  Hongmei Zhao,et al.  The CIPS-SIGHAN CLP 2010 Chinese Word Segmentation Bakeoff , 2010 .

[17]  Diana Inkpen,et al.  Segmentation Similarity and Agreement , 2012, NAACL.

[18]  Chris Fournier,et al.  Evaluating Text Segmentation using Boundary Edit Distance , 2013, ACL.

[19]  Yvette Graham,et al.  Re-evaluating Automatic Summarization with BLEU and 192 Shades of ROUGE , 2015, EMNLP.

[20]  Yvette Graham,et al.  Improving Evaluation of Machine Translation Quality Estimation , 2015, ACL.