Syntactic annotation of non-canonical linguistic structures

This paper deals with the syntactic annotation of corpora that contain both ‘canonical’ and ‘non-canonical’ sentences. Consider Examples (1) and (2) from the German learner corpus Falko which will be introduced below. (1) represents a syntactically correct (although perhaps not very enlightening) utterance to which it is easy to assign a syntactic structure. The utterance in (2), on the other hand, would be considered incorrect (and probably be interpreted as a word order error) – it is much more difficult to assign a syntactic structure to it. The question is: how can (1) and (2) be annotated in a uniform way that shows that there is a difference and makes clear exactly where that difference lies?

[1]  Gisela Zifonun,et al.  Grammatik der deutschen Sprache , 1997 .

[2]  Wolfgang Sternefeld,et al.  Syntax: An International Handbook of Contemporary Research , 1993 .

[3]  Günther Grewendorf,et al.  Sprachliches Wissen : eine Einführung in moderne Theorien der grammatischen Beschreibung , 1987 .

[4]  Gisela Zifonun,et al.  E4 Die Linearstruktur des Satzes , 1997 .

[5]  Anke Lüdeling,et al.  Corpus Linguistics: An International Handbook , 2009 .

[6]  Geoffrey Sampson,et al.  Thoughts on Two Decades of Drawing Trees , 2003 .

[7]  Comrie Bernard Language Universals and Linguistic Typology , 1982 .

[8]  Patrick Grommes,et al.  Mehrdeutigkeiten und Kategorisierung: Probleme bei der Annotation von Lernerkorpora , 2008 .

[9]  Geoffrey Sampson,et al.  English for the Computer: The SUSANNE Corpus and Analytic Scheme , 1995, Computational Linguistics.

[10]  Laura P. Izquierdo Pedrosa,et al.  Error analysis and interlanguage , 2004 .

[11]  Erich Drach,et al.  Grundgedanken der deutschen Satzlehre , 1963 .

[12]  Maik Walter Handbuch der deutschen Konnektoren. Linguistische Grundlagen der Beschreibung und syntaktische Merkmale der deutschen Satzverknüpfer (Konjunktionen, Satzadverbien und Partikeln) , 2006 .

[13]  Johannes Schwitalla Gesprochenes Deutsch : eine Einführung , 2006 .

[14]  Anke Lüdeling,et al.  Multi-level error annotation in learner corpora , 2005 .

[15]  Sylviane Granger,et al.  A Bird’s-eye view of learner corpus research , 2002 .

[16]  Peter Siemen,et al.  FALKO - Ein fehlerannotiertes Lernerkorpus des Deutschen , 2006 .

[17]  Geoffrey Sampson English for the computer , 1995 .

[18]  Mary P. Harper,et al.  SParseval: Evaluation Metrics for Parsing Speech , 2006, LREC.

[19]  Thomas C. Schmidt Transcribing and annotating spoken language with EXMARaLDA , 2004 .

[20]  S. P. Corder,et al.  Error analysis and interlanguage , 1981 .