Propbank-Br: a Brazilian Treebank annotated with semantic role labels

This paper reports the annotation of a Brazilian Portuguese Treebank with semantic role labels following Propbank guidelines. A different language and a different parser output impact the task and require some decisions on how to annotate the corpus. Therefore, a new annotation guide ― called Propbank-Br - has been generated to deal with specific language phenomena and parser problems. In this phase of the project, the corpus was annotated by a unique linguist. The annotation task reported here is inserted in a larger projet for the Brazilian Portuguese language. This project aims to build Brazilian verbs frames files and a broader and distributed annotation of semantic role labels in Brazilian Portuguese, allowing inter-annotator agreement measures. The corpus, available in web, is already being used to build a semantic tagger for Portuguese language.

[1]  Sandra M. Aluísio,et al.  Assigning Wh-Questions to Verbal Arguments: Annotation Tools Evaluation and Corpus Building , 2010, LREC.

[2]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[3]  Nianwen Xue,et al.  Adding semantic roles to the Chinese Treebank , 2009, Natural Language Engineering.

[4]  Cícero Nogueira dos Santos,et al.  Semantic Role Labeling , 2012 .

[5]  Martha Palmer,et al.  Propbank Instance Annotation Guidelines Using a Dedicated Editor, Jubilee , 2010, LREC.

[6]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[7]  Eckhard Bick,et al.  Floresta Sintá(c)tica: A treebank for Portuguese , 2002, LREC.

[8]  Arantza Díaz de Ilarraza,et al.  Building the Basque PropBank , 2010, LREC.

[9]  Jorge Baptista,et al.  Auxiliary Verbs and Verbal Chains in European Portuguese , 2010, PROPOR.

[10]  Martha Palmer,et al.  Propbank Frameset Annotation Guidelines Using a Dedicated Editor, Cornerstone , 2010, LREC.

[11]  Emmon W. Bach,et al.  Universals in Linguistic Theory , 1970 .

[12]  Martha Palmer,et al.  Adding predicate argument structure to the Penn TreeBank , 2002 .

[13]  Oren Etzioni,et al.  An analysis of open information extraction based on semantic role labeling , 2011, K-CAP '11.

[14]  Mirella Lapata,et al.  Using Semantic Roles to Improve Question Answering , 2007, EMNLP.

[15]  Katrin Erk,et al.  SALTO - A Versatile Multi-Level Annotation Tool , 2006, LREC.

[16]  Daniel Gildea,et al.  The Necessity of Parsing for Predicate Argument Recognition , 2002, ACL.

[17]  Ann Bies,et al.  A Pilot Arabic Propbank , 2008, LREC.