Phrase Structure Annotation and Parsing for Learner English

There has been almost no work on phrase structure annotation and parsing specially designed for learner English despite the fact that they are useful for representing the structural characteristics of learner English. To address this problem, in this paper, we first propose a phrase structure annotation scheme for learner English and annotate two different learner corpora using it. Second, we show their usefulness, reporting on (a) inter-annotator agreement rate, (b) characteristic CFG rules in the corpora, and (c) parsing performance on them. In addition, we explore methods to improve phrase structure parsing for learner English (achieving an F -measure of 0.878). Finally, we release the full annotation guidelines, the annotated data, and the improved parser model for learner English to the public.

[1]  Jennifer Foster,et al.  Treebanks Gone Bad: Generating a Treebank of Ungrammatical English , 2007 .

[2]  Kiyotaka Uchimoto,et al.  The NICT JLE Corpus Exploiting the language learners' speech database for research and education , 2004 .

[3]  Jennifer Foster Treebanks Gone Bad Parser Evaluation and Retraining using a Treebank of Ungrammatical Sentences , 2007 .

[4]  Walt Detmar Meurers,et al.  Towards interlanguage POS annotation for effective learner corpora in SLA and FLT , 2009 .

[5]  Edward W. D. Whittaker,et al.  Creating a manually error-tagged and shallow-parsed learner corpus , 2011, ACL.

[6]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[7]  Markus Dickinson,et al.  Defining Syntax for Learner Language Annotation , 2012, COLING.

[8]  Niels Ott,et al.  Evaluating Dependency Parsing Performance on German Learner Language , 2010 .

[9]  Jennifer Foster,et al.  Parsing Ungrammatical Input: an Evaluation Procedure , 2004, LREC.

[10]  Keisuke Sakaguchi,et al.  Phrase Structure Annotation and Parsing for Learner English , 2017 .

[11]  Adam Kilgarriff,et al.  Helping Our Own: The HOO 2011 Pilot Shared Task , 2011, ENLG.

[12]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[13]  Boris Katz,et al.  Universal Dependencies for Learner English , 2016, ACL.

[14]  Arne Skjærholt A chance-corrected measure of inter-annotator agreement for syntax , 2014, ACL.

[15]  Aoife Cahill,et al.  Self-Training for Parsing Learner Text , 2014 .

[16]  Jennifer Foster,et al.  Using Parse Features for Preposition Selection and Error Detection , 2010, ACL.

[17]  Sylviane Granger,et al.  The International Corpus of Learner English , 1993 .

[18]  Jennifer Foster,et al.  GenERRate: Generating Errors for Use in Grammatical Error Detection , 2009, BEA@NAACL.

[19]  Arne Skjaerholt A chance-corrected measure of inter-annotator agreement for syntax , 2014, ACL 2014.

[20]  Markus Dickinson,et al.  Dependency Annotation for Learner Corpora , 2009 .

[21]  Aoife Cahill,et al.  Parsing Learner Text: to Shoehorn or not to Shoehorn , 2015, LAW@NAACL-HLT.

[22]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[23]  Slav Petrov,et al.  Products of Random Latent Variable Grammars , 2010, NAACL.

[24]  Yuji Matsumoto,et al.  The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings , 2012, COLING.

[25]  Markus Dickinson,et al.  Inter-annotator Agreement for Dependency Annotation of Learner Language , 2013, BEA@NAACL-HLT.

[26]  Wolfgang Sternefeld,et al.  Annotating and Querying a Treebank of Suboptimal Structures , 2004 .

[27]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[28]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.