Avoiding the Comparative Fallacy inthe Annotation of Learner Corpora

It is becoming more common to use corpora of second language learner data in order to support research on various second language acquisition (SLA) topics (e.g., Römer, 2009; Wulff et al., 2009), but there has been little use of corpus annotation. For many questions in SLA research, using a corpus is simple and in no need of annotation: one can search a corpus for specific words to find relevant examples. For example, if one wants to examine how modal verbs are used by L2 learners (cf., e.g., Aijmer, 2002), one can search for those specific lexical items (can, should, etc.) and analyze the output by hand. Consider a search for syntactic patterns, however, such as examining wh movement (e.g., Juffs, 2005; Wolfe-Quintero, 1992; Schachter, 1989). These types of questions require more linguistic abstraction (cf., e.g., Lüdeling, 2010). If we take the learner sentence (1), for example, what kind of search involving specific words addresses questions about the function of whom?1 If we search for all instances of whom in a corpus, we still have to determine whether this is a relative clause marker, whether this is subject or object extraction, or what the depth of embedding is; and then we need to do the same for that, which, or even other prepositional objects. We need the data marked with syntactic annotation.

[1]  Adriane Boyd,et al.  EAGLE: an Error-Annotated Corpus of Beginning Learner German , 2010, LREC.

[2]  C. Chapelle,et al.  Natural Language Processing and Language Learning , 2012 .

[3]  Manfred Pienemann,et al.  Constructing an Acquisition-Based Procedure for Second Language Assessment , 1988, Studies in Second Language Acquisition.

[4]  Alon Lavie,et al.  Adding Syntactic Annotations to Transcripts of Parent-Child Dialogs , 2004, LREC.

[5]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[6]  Shinichi Izumi,et al.  Processing Difficulty in Comprehension and Production of Relative Clauses by Learners of English as a Second Language , 2003 .

[7]  Harald Clahsen,et al.  The availability of universal grammar to adult and child learners - a study of the acquisition of German word order , 1986 .

[8]  Karin Aijmer,et al.  Modality in advanced Swedish learners’ written interlanguage , 2002 .

[9]  Alan Juffs The influence of first language on the processing of wh-movement in English as a second language , 2005 .

[10]  D Nicholls,et al.  The Cambridge Learner Corpus-Error coding and analysis , 1999 .

[11]  Nina Spada,et al.  语言学习机制=How languages are learned , 1995 .

[12]  Walt Detmar Meurers,et al.  Towards interlanguage POS annotation for effective learner corpora in SLA and FLT , 2009 .

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Christian Adjémian,et al.  ON THE NATURE OF INTERLANGUAGE SYSTEMS , 1976 .

[15]  Stefano Rastelli Learner Corpora without Error Tagging , 2013 .

[16]  Seok Bae Jang,et al.  Annotation of Korean Learner Corpora for Particle Error Detection , 2013 .

[17]  Rex A. Sprouse,et al.  Word Order and Nominative Case in Non-Native Language Acquisition: A longitudinal study of (L1 Turkish) German Interlanguage , 1994 .

[18]  James F. Allen,et al.  Draft of DAMSL Dialog Act Markup in Several Layers , 2007 .

[19]  Ines Rehbein,et al.  Syntactic Misuse, Overuse and Underuse: A Study of a Parsed Learner Corpus and its Target Hypothesis , 2010 .

[20]  Kari Tenfjord,et al.  The "Hows" and the "Whys" of Coding Categories in a Learner Corpus (or "How and Why an Error-Tagged Learner Corpus is not 'ipso facto' One Big Comparative Fallacy") , 2006 .

[21]  Markus Dickinson,et al.  Dependency Annotation for Learner Corpora , 2009 .

[22]  Martin Wynne,et al.  Developing Linguistic Corpora: a Guide to Good Practice , 2005 .

[23]  Kenneth Wexler,et al.  Language acquisition studies in generative grammar : papers in honor of Kenneth Wexler from the 1991 GLOW workshops , 1994 .

[24]  Ana Díaz-Negrillo,et al.  ERROR TAGGING SYSTEMS FOR LEARNER CORPORA , 2006 .

[25]  Alexandr Rosen,et al.  Error-Tagged Learner Corpus of Czech , 2010, Linguistic Annotation Workshop.

[26]  Robert W. Bley-Vroman THE COMPARATIVE FALLACY IN INTERLANGUAGE STUDIES: THE CASE OF SYSTEMATICITY1 , 1983 .

[27]  Sylviane Granger,et al.  Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching , 2002 .

[28]  Sylviane Granger,et al.  Computer learner corpus research: current status and future prospects , 2004 .

[29]  W. O'grady,et al.  A SUBJECT-OBJECT ASYMMETRY IN THE ACQUISITION OF RELATIVE CLAUSES IN KOREAN AS A SECOND LANGUAGE , 2003, Studies in Second Language Acquisition.

[30]  Geoffrey Leech,et al.  Adding linguistic annotation. , 2005 .

[31]  Mitchell P. Marcus,et al.  Adding Semantic Annotation to the Penn TreeBank , 1998 .

[32]  Walt Detmar Meurers Compiling a Task-Based Corpus for the Analysis of Learner Language in Context , 2009 .

[33]  Stefanie Wulff,et al.  The acquisition of tense-aspect: Converging evidence from corpora and telicity ratings , 2009 .

[34]  Usha Lakshmanan,et al.  Analysing interlanguage: how do we know what learners know? , 2001 .

[35]  Kate Wolfe Quintero Learnability and the Acquistion of Extraction in Relative Clauses and Wh-Questions , 1992, Studies in Second Language Acquisition.

[36]  Sunyoung Lee-Ellis,et al.  THE ELICITED PRODUCTION OF KOREAN RELATIVE CLAUSES BY HERITAGE SPEAKERS , 2011, Studies in Second Language Acquisition.

[37]  Kathleen F. McCoy,et al.  A Methodology for Developing an Error Taxonomy for a Computer Assisted Language Learning Tool for Se , 1993 .

[38]  Belma Haznedar,et al.  The status of functional categories in child second language acquisition: evidence from the acquisition of CP , 2003 .

[39]  Orsolya Vincze,et al.  Towards a Motivated Annotation Schema of Collocation Errors in Learner Corpora , 2010, LREC.

[40]  Alon Lavie,et al.  High-accuracy Annotation and Parsing of CHILDES Transcripts , 2007 .

[41]  Manfred Pinkal,et al.  Towards a Resource for Lexical Semantics: A Large German Corpus with Extensive Semantic Annotation , 2003, ACL.

[42]  Lydia White,et al.  UG or not UG, that is the question: a reply to Clahsen and Muysken , 1987 .

[43]  Sylviane Granger,et al.  Error-tagged learner corpora and CALL: a promising synergy , 2003 .

[44]  Anke Lüdeling,et al.  Multi-level error annotation in learner corpora , 2005 .

[45]  Heejeong Ko,et al.  Article Semantics in L2 Acquisition: The Role of Specificity , 2004 .

[46]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[47]  Dan Roth,et al.  Annotating ESL Errors: Challenges and Rewards , 2010 .

[48]  Geoffrey Sampson,et al.  English for the Computer: The SUSANNE Corpus and Analytic Scheme , 1995, Computational Linguistics.

[49]  U. Römer The inseparability of lexis and grammar: Corpus linguistic perspectives , 2009 .

[50]  Eileen Fitzpatrick,et al.  The Montclair Electronic Language Database Project , 2004 .

[51]  Rod Ellis,et al.  The Study of Second Language Acquisition , 1994 .

[52]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.