Towards Universal Dependencies for Learner Chinese

We propose an annotation scheme for learner Chinese in the Universal Dependencies (UD) framework. The schemewas adapted from a UD scheme for Mandarin Chinese to take interlanguage characteristics into account. We applied the scheme to a set of 100 sentenceswritten by learners of Chinese as a foreign language, and we report inter-annotator agreement on syntactic annotation.

[1]  Lung-Hao Lee,et al.  Overview of NLP-TEA 2016 Shared Task for Chinese Grammatical Error Diagnosis , 2016, NLP-TEA@COLING.

[2]  Mei Tsu-lin,et al.  Syntax, Diction, and Imagery in T'ang Poetry , 1971 .

[3]  Boris Katz,et al.  Universal Dependencies for Learner English , 2016, ACL.

[4]  A. Lavie,et al.  Morphosyntactic annotation of CHILDES transcripts. , 2010, Journal of child language.

[5]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[6]  D Nicholls,et al.  The Cambridge Learner Corpus-Error coding and analysis , 1999 .

[7]  Xinying Chen,et al.  Developing Universal Dependencies for Mandarin Chinese , 2016, ALR@COLING.

[8]  Keisuke Sakaguchi,et al.  Phrase Structure Annotation and Parsing for Learner English , 2016, ACL.

[9]  Yuen-Hsien Tseng,et al.  Developing learner corpus annotation for Chinese grammatical errors , 2016, 2016 International Conference on Asian Language Processing (IALP).

[10]  Edward W. D. Whittaker,et al.  Creating a manually error-tagged and shallow-parsed learner corpus , 2011, ACL.

[11]  Geoffrey Sampson,et al.  English for the Computer: The SUSANNE Corpus and Analytic Scheme , 1995, Computational Linguistics.

[12]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[13]  Anke Lüdeling,et al.  Competing Target Hypotheses in the Falko Corpus: A Flexible Multi-Layer Corpus Architecture , 2011 .

[14]  Markus Dickinson,et al.  Inter-annotator Agreement for Dependency Annotation of Learner Language , 2013, BEA@NAACL-HLT.

[15]  Ines Rehbein,et al.  Better tags give better trees – or do they? , 2011 .

[16]  Jeroen Geertzen,et al.  Automatic Linguistic Annotation ofLarge Scale L2 Databases: The EF-Cambridge Open Language Database(EFCamDat) , 2014 .

[17]  Shervin Malmasi,et al.  The Jinan Chinese Learner Corpus , 2015, BEA@NAACL-HLT.

[18]  Walt Detmar Meurers,et al.  Towards interlanguage POS annotation for effective learner corpora in SLA and FLT , 2009 .