The MASC Word Sense Corpus

The MASC project has produced a multi-genre corpus with multiple layers of linguistic annotation, together with a sentence corpus containing WordNet 3.1 sense tags for 1000 occurrences of each of 100 words produced by multiple annotators, accompanied by indepth inter-annotator agreement data. Here we give an overview of the contents of MASC and then focus on the word sense sentence corpus, describing the characteristics that differentiate it from other word sense corpora and detailing the inter-annotator agreement studies that have been performed on the annotations. Finally, we discuss the potential to grow the word sense sentence corpus through crowdsourcing and the plan to enhance the content and annotations of MASC through a community-based collaborative effort.

[1]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[2]  Nancy Ide MultiMASC : An Open Linguistic Infrastructure for Language Research , 2012 .

[3]  Christiane Fellbaum,et al.  The Manually Annotated Sub-Corpus: A Community Resource for and by the People , 2010, ACL.

[4]  Josef Ruppenhofer,et al.  FrameNet II: Extended theory and practice , 2006 .

[5]  Nancy Ide,et al.  Anveshan: A Framework for Analysis of Multiple Annotators’ Labeling Behavior , 2010, Linguistic Annotation Workshop.

[6]  Nancy Ide,et al.  Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations , 2012, Lang. Resour. Evaluation.

[7]  Nancy Ide,et al.  Making Sense of Word Sense Variation , 2009, SEW@NAACL-HLT.

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Nancy Ide,et al.  Word Sense Annotation of Polysemous Words by Multiple Annotators , 2010, LREC.

[10]  Hiroaki Sato,et al.  The FrameNet Data and Software , 2003, ACL.

[11]  Marco Pennacchiotti,et al.  FATE: a FrameNet-Annotated Corpus for Textual Entailment , 2008, LREC.

[12]  Nizar Habash,et al.  Inter-annotator Agreement on a Multilingual Semantic Annotation Task , 2006, LREC.

[13]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[14]  Rebecca J. Passonneau,et al.  Measuring Agreement on Set-valued Items (MASI) for Semantic and Pragmatic Annotation , 2006, LREC.