MASC: the Manually Annotated Sub-Corpus of American English

To answer the critical need for sharable, reusable annotated resources with rich linguistic annotations, we are developing a Manually Annotated Sub-Corpus (MASC) including texts from diverse genres and manual annotations or manually-validated annotations for multiple levels, including WordNet senses and FrameNet frames and frame elements, both of which have become significant resources in the international computational linguistics community. To derive maximal benefit from the semantic information provided by these resources, the MASC will also include manually-validated shallow parses and named entities, which will enable linking WordNet senses and FrameNet frames within the same sentences into more complex semantic structures and, because named entities will often be the role fillers of FrameNet frames, enrich the semantic and pragmatic information derivable from the sub-corpus. All MASC annotations will be published with detailed inter-annotator agreement measures. The MASC and its annotations will be freely downloadable from the ANC website, thus providing maximum accessibility for researchers from around the globe.

[1]  Kathleen R. McKeown,et al.  Applying the Pyramid Method in DUC 2005 , 2005 .

[2]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[3]  Hiroaki Sato,et al.  FrameNet as a “Net” , 2004, LREC.

[4]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[5]  C. Fillmore FRAME SEMANTICS AND THE NATURE OF LANGUAGE * , 1976 .

[6]  Katrin Erk,et al.  SemEval-2007 Task 19: Frame Semantic Structure Extraction , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[7]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[8]  Nicoletta Ide Nancy Calzolari,et al.  Language Resources and Evaluation , 1966 .

[9]  Birte Lönneker-Rodman,et al.  The FrameNet model and its applications† , 2009, Natural Language Engineering.

[10]  Daniel Jurafsky,et al.  Shallow Semantic Parsing using Support Vector Machines , 2004, NAACL.

[11]  K. Krippendorff Krippendorff, Klaus, Content Analysis: An Introduction to its Methodology . Beverly Hills, CA: Sage, 1980. , 1980 .

[12]  Richard Johansson,et al.  LTH: Semantic Structure Extraction using Nonprojective Dependency Trees , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[13]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[14]  Nizar Habash,et al.  Inter-annotator Agreement on a Multilingual Semantic Annotation Task , 2006, LREC.

[15]  Nancy Ide,et al.  GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.

[16]  Srini Narayanan,et al.  Ontology-based Reasoning about Lexical Resources , 2006 .

[17]  Ken Litkowski,et al.  Senseval-3 task: Automatic labeling of semantic roles , 2004, SENSEVAL@ACL.

[18]  LÖnneker-rodmanBirte,et al.  The framenet model and its applications , 2009 .