Making Sense of Word Sense Variation

We present a pilot study of word-sense annotation using multiple annotators, relatively polysemous words, and a heterogenous corpus. Annotators selected senses for words in context, using an annotation interface that presented WordNet senses. Interannotator agreement (IA) results show that annotators agree well or not, depending primarily on the individual words and their general usage properties. Our focus is on identifying systematic differences across words and annotators that can account for IA variation. We identify three lexical use factors: semantic specificity of the context, sense concreteness, and similarity of senses. We discuss systematic differences in sense selection across annotators, and present the use of association rules to mine the data for systematic differences across annotators.

[1]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[2]  G. Nunberg The non-uniqueness of semantic solutions: Polysemy , 1979 .

[3]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[4]  K. Krippendorff Krippendorff, Klaus, Content Analysis: An Introduction to its Methodology . Beverly Hills, CA: Sage, 1980. , 1980 .

[5]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[6]  Owen Rambow,et al.  On the need for domain communication knowledge , 1991 .

[7]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[10]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[11]  Adam Kilgarriff,et al.  SENSEVAL: an exercise in evaluating world sense disambiguation programs , 1998, LREC.

[12]  Jean Véronis,et al.  A study of polysemy judgements and inter-annotator agreement , 1999 .

[13]  Chung Yong Lim,et al.  A Case Study on Inter-Annotator Agreement for Word Sense Disambiguation , 1999 .

[14]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[15]  Nancy Ide,et al.  Sense Discrimination with Parallel Corpora , 2002, SENSEVAL.

[16]  G. Murphy,et al.  Paper has been my ruin: Conceptual relations of polysemous senses , 2002 .

[17]  Ted Pedersen Evaluating the Effectiveness of Ensembles of Decision Trees , 2002, SENSEVAL.

[18]  Ted Pedersen,et al.  Assessing System Agreement and Instance Difficulty in the Lexical , 2002, SENSEVAL.

[19]  Julio Gonzalo,et al.  A Study of Polysemy and Sense Proximity in the Senseval-2 Test Suite , 2002, SENSEVAL.

[20]  Rebecca J. Passonneau Computing Reliability for Coreference Annotation , 2004, LREC.

[21]  Mona T. Diab Relieving the data Acquisition Bottleneck in Word Sense Disambiguation , 2004, ACL.

[22]  Josef Ruppenhofer,et al.  FrameNet II: Extended theory and practice , 2006 .

[23]  Nancy Ide,et al.  Making Senses: Bootstrapping Sense-Tagged Lists of Semantically-Related Words , 2006, CICLing.

[24]  Nizar Habash,et al.  Inter-annotator Agreement on a Multilingual Semantic Annotation Task , 2006, LREC.

[25]  Christiane Fellbaum,et al.  Making fine-grained and coarse-grained sense distinctions, both manually and automatically , 2006, Natural Language Engineering.

[26]  Yorick Wilks,et al.  Making Sense About Sense , 2007 .

[27]  Ansaf Salleb-Aouissi,et al.  QuantMiner: A Genetic Algorithm for Mining Quantitative Association Rules , 2007, IJCAI.

[28]  Daniel Jurafsky,et al.  Learning to Merge Word Senses , 2007, EMNLP.

[29]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.