A LAF/GrAF based Encoding Scheme for underspecified Representations of syntactic Annotations

Data models and encoding formats for syntactically annotated text corpora need to deal with syntactic ambiguity; underspecified representations are particularly well suited for the representation of ambiguous data because they allow for high informational efficiency. We discuss the issue of being informationally efficient, and the trade-off between efficient encoding of linguistic annotations and complete documentation of linguistic analyses. The main topic of this article is a data model and an encoding scheme based on LAF/GrAF (Ide and Romary, 2006; Ide and Suderman, 2007) which provides a flexible framework for encoding underspecified representations. We show how a set of dependency structures and a set of TiGer graphs (Brants et al., 2002) representing the readings of an ambiguous sentence can be encoded, and we discuss basic issues in querying corpora which are encoded using the framework presented here.

[1]  Katrin Erk,et al.  The SALSA Corpus: a German Corpus Resource for Lexical Semantics , 2006, LREC.

[2]  Richard Eckart,et al.  An XML-based data model for flexible representation and query of linguistically interpreted corpora , 2007 .

[3]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..

[4]  Charles J. Fillmore,et al.  The Structure of the Framenet Database , 2003 .

[5]  Ingo Schröder,et al.  Natural language parsing with graded constraints , 2002 .

[6]  Michael Schiehlen A Cascaded Finite-State Parser for German , 2003, EACL.

[7]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[8]  Jonathan G. Fiscus,et al.  A Pratical Introduction to ATLAS , 2002, LREC.

[9]  Helmut Schmid Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors , 2004, COLING.

[10]  Jochen Dörre Efficient construction of underspecified semantics under massive ambiguity , 1997 .

[11]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[12]  Nancy Ide,et al.  Representing Linguistic Corpora and Their Annotations , 2006, LREC.

[13]  Jochen Dörre Efficient Construction of Underspecified Semantics under Massive Ambiguity , 1997, ACL.

[14]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[15]  Uwe Reyle,et al.  Dealing with Ambiguities by Underspecification: Construction, Representation and Deduction , 1993, J. Semant..

[16]  Nancy Ide,et al.  GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.

[17]  Jean Carletta,et al.  A generic approach to software support for linguistic annotation using XML , 2005 .

[18]  Jonathan G. Fiscus,et al.  A Practical Introduction to ATLAS , 2002 .