Semantic Annotation for Interlingual Representation of Multilingual Texts

This paper describes the annotation process being used in a multi-site project to create six sizable bilingual parallel corpora annotated with a consistent interlingua representation. After presenting the background and objectives of the effort, we describe the multilingual corpora and the three stages of interlingual representation being developed. We then focus on the annotation process itself, including an interface environment that supports the annotation task, and the methodology for evaluating the interlingua representation. Finally, we discuss some issues encountered during the annotation tasks. The resulting annotated multilingual corpora will be useful for a wide range of natural language processing research tasks, including machine translation, question answering, text summarization, and information extraction.

[1]  Kevin Knight,et al.  Building a Large-Scale Knowledge Base for Machine Translation , 1994, AAAI.

[2]  Martha Palmer,et al.  Representation of actions as an interlingua , 2000 .

[3]  Eduard Hovy,et al.  Semi-automatic Construction of a General Purpose Ontology , 2003 .

[4]  Kevin Knight,et al.  Preserving Ambiguities in Generation via Automata Intersection , 2000, AAAI/IAAI.

[5]  Sergei Nirenburg,et al.  A Situated Ontology for Practical NLP , 1995 .

[6]  Eduard Hovy,et al.  Data Acquisition and Integration in the DGRC's Energy Data Collection Project , 2001 .

[7]  B. Levin,et al.  From Lexical Semantics to Argument Realization , 1996 .

[8]  Sergei Nirenburg,et al.  CRL's TREC-8 Systems Cross-Lingual IR, and Q&A , 1999, TREC.

[9]  Tim Stowell,et al.  Origins of phrase structure , 1981 .

[10]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[11]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[12]  Jaime G. Carbonell,et al.  An Efficient Interlingua Translation System for Multi-lingual Document Production , 1991, MTSUMMIT.

[13]  Bonnie J. Dorr,et al.  Machine Translation: A View from the Lexicon , 1994, CL.

[14]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[15]  Nizar Habash,et al.  Efficient Language Independent Generation from Lexical Conceptual Structures , 2001 .

[16]  Teruko Mitamura,et al.  Deriving Semantic Knowledge from Descriptive Texts Using an MT System , 2002, AMTA.

[17]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[18]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .