This paper describes the Error-Annotated German Learner Corpus (EAGLE), a corpus of beginning learner German with grammatical error annotation. The corpus contains online workbook and and hand-written essay data from learners in introductory German courses at The Ohio State University. We introduce an error typology developed for beginning learners of German that focuses on linguistic properties of lexical items present in the learner data and present the detailed error typologies for selection, agreement, and word order errors. The corpus uses an error annotation format that extends the multi-layer standoff format proposed by Luedeling et al. (2005) to include incremental target hypotheses for each error. In this format, each annotated error includes information about the location of tokens affected by the error, the error type, and the proposed target correction. The multi-layer standoff format allows us to annotate ambiguous errors with more than one possible target correction and to annotate the multiple, overlapping errors common in beginning learner productions.
[1]
Anke Lüdeling,et al.
Multi-level error annotation in learner corpora
,
2005
.
[2]
Sylviane Granger,et al.
Error-tagged learner corpora and CALL: a promising synergy
,
2003
.
[3]
Peter Siemen,et al.
FALKO - Ein fehlerannotiertes Lernerkorpus des Deutschen
,
2006
.
[4]
Anne Rimrott,et al.
SPELL CHECKING IN COMPUTER-ASSISTED LANGUAGE LEARNING: A STUDY OF MISSPELLINGS BY NONNATIVE WRITERS OF GERMAN
,
2005
.
[5]
Thomas C. Schmidt.
The transcription system EXMARaLDA: An application of the annotation graph formalism as the basis of a database of multilingual spoken discourse
,
2001
.
[6]
Margaret Rogers,et al.
ON MAJOR TYPES OF WRITTEN ERROR IN ADVANCED STUDENTS OF GERMAN
,
1984
.
[7]
Vilius Juozulynas.
Errors in the Compositions of Second-Year German Students: An Empirical Study for Parser-Based ICALI
,
1994
.