On Heads and Coordination in Valence Acquisition

The aim of this paper is to present the design of a partial syntactic annotation of the IPI PAN Corpus of Polish [22] and the corresponding extension of the corpus search engine Poliqarp [25,12] developed at the Institue of Computer Science PAS and currently employed in Polish and Portuguese corpora projects. In particular, we will argue for the need to distinguish between, and represent both, syntactic and semantic heads, and we will sketch the representation of coordination, the area traditionally controversial both in theoretical and in computational linguistics. The annotation is designed in a way intended to maximise the usefulness of the resulting corpus for the task of automatic valence acquisition.

[1]  Zygmunt Saloni,et al.  Składnia współczesnego języka polskiego , 1987 .

[2]  IVAN A. SAG, GERALD GAZDAR, THOMAS WASOW, AND STEVEN WEISLER COORDINATION AND HOW TO DISTINGUISH , .

[3]  Adam Przepiórkowski,et al.  Case Assignment and the Complement/Adjunct Dichotomy: A Non-Configurational Constraint-Based Approach , 1999 .

[4]  Amália Mendes,et al.  Open Resources and Tools for the Shallow Processing of Portuguese: The TagShare Project , 2006, LREC.

[5]  Ted Briscoe,et al.  Efficient Extraction of Grammatical Relations , 2005, IWPT.

[6]  Chu-Ren Huang,et al.  Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface , 2000, ACL 2000.

[7]  Oliver Christ,et al.  A Modular and Flexible Architecture for an Integrated Corpus Query System , 1994, ArXiv.

[8]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[9]  Nancy Ide,et al.  XCES: An XML-based Encoding Standard for Linguistic Corpora , 2000, LREC.

[10]  Piotr Banski,et al.  A Search Tool for Corpora with Positional Tagsets and Ambiguities , 2004, LREC.

[11]  Andreas Kathol,et al.  When a Head is not a Head: A Constructional Approach to Exocentricity in English , 2003 .

[12]  Adam Przepiórkowski Automatic Extraction of Polish Verb Subcategorization An Evaluation of Common Statistics , 2005 .

[13]  Hiroaki Sato,et al.  Seeing Arguments through Transparent Structures , 2002, LREC.

[14]  Ivan A. Sag,et al.  Information-based syntax and semantics , 1987 .

[15]  Max Silberztein,et al.  Finite-State Description of the French Determiner system , 2003, Journal of French Language Studies.

[16]  Christopher R. Johnson,et al.  Background to Framenet , 2003 .

[17]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[18]  Petr Sgall,et al.  The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects , 1986 .

[19]  Adam Przepiórkowski,et al.  Baseline Experiments in the Extraction of Polish Valence Frames , 2005, Intelligent Information Systems.

[20]  Ludwig M. Eichinger,et al.  Dependenz und Valenz : ein internationales Handbuch der zeitgenössischen Forschung , 2003 .

[21]  Adam Przepiórkowski,et al.  A Flexemic Tagset for Polish , 2003 .

[22]  Ivan A. Sag,et al.  Coordinate ellipsis and apparent non-constituent coordination , 2004, Proceedings of the International Conference on Head-Driven Phrase Structure Grammar.

[23]  Joakim Nivre,et al.  Theory-supporting treebanks , 2003 .

[24]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[25]  Adam Przepiórkowski On Heads and Coordination in a Partial Treebank , 2006 .

[26]  M. de Rijke,et al.  Tequesta: The University of Amsterdam's Textual Question Answering System , 2001, TREC.

[27]  Ivan A. Sag,et al.  Information-Based Syntax and Semantics: Volume 1, Fundamentals , 1987 .

[28]  Adam Przepiórkowski,et al.  The Unberable Lightness of Tagging* A Case Study in Morphosyntactic Tagging of Polish , 2003, LINC@EACL.

[29]  Anne Abeillé,et al.  Treebanks: Building and Using Parsed Corpora , 2003 .

[30]  António Branco,et al.  A Suite of Shallow Processing Tools for Portuguese: LX-Suite , 2006, EACL.