Resources and Evaluations of Automated Chinese Error Diagnosis for Language Learners

Chinese as a foreign language (CFL) learners may, in their language production, generate inappropriate linguistic usages, including character-level confusions (or commonly known as spelling errors) and word-/sentence-/discourse-level grammatical errors. Chinese spelling errors frequently arise from confusions among multiple-character words that are phonologically and visually similar but semantically distinct. Chinese grammatical errors contain coarse-grained surface differences in terms of missing, redundant, incorrect selection, and word ordering error of linguistic components. Simultaneously, fine-grained error types further focus on representing linguistic morphology and syntax such as verb, noun, preposition, conjunction, adverb, and so on. Annotated learner corpora are important language resources to understand these error patterns and to help the development of error diagnosis systems. In this chapter, we describe two representative Chinese learner corpora: the HSK Dynamic Composition Corpus constructed by Beijing Language and Culture University and the TOCFL Learner Corpus built by National Taiwan Normal University. In addition, we introduce several evaluations based on both learner corpora designed for computer-assisted Chinese learning. One is a series of SIGHAN bakeoffs for Chinese spelling checkers. The other series are the NLPTEA workshop shared tasks for Chinese grammatical error identification. The purpose of this chapter is to summarize the resources and evaluations for better understanding the current research developments and challenges of automated Chinese error diagnosis for CFL learners.

[1]  Yih-Ru Wang,et al.  NCTU and NTUT's Entry to CLP-2014 Chinese Spelling Check Evaluation , 2014, CIPS-SIGHAN.

[2]  Ana Díaz-Negrillo,et al.  ERROR TAGGING SYSTEMS FOR LEARNER CORPORA , 2006 .

[3]  Lei Huang,et al.  Chinese Spelling Check System Based on N-gram Model , 2015, SIGHAN@IJCNLP.

[4]  Mamoru Komachi,et al.  Extracting a Chinese learner corpus from the web: Grammatical error correction for Learning Chinese as a foreign language with statistical machine translation , 2014, ICCE 2014.

[5]  Hai Zhao,et al.  An Improved Graph Model for Chinese Spell Checking , 2014, CIPS-SIGHAN.

[6]  Jason S. Chang,et al.  Chinese Spell Checking Based on Noisy Channel Model , 2014, CIPS-SIGHAN.

[7]  Yih-Ru Wang,et al.  Word Vector/Conditional Random Field-based Chinese Spelling Error Detection for SIGHAN-2015 Evaluation , 2015, SIGHAN@IJCNLP.

[8]  Yuanzhuo Wang,et al.  Extended HMM and Ranking Models for Chinese Spelling Correction , 2014, CIPS-SIGHAN.

[9]  Houfeng Wang,et al.  Bi-LSTM Neural Networks for Chinese Grammatical Error Diagnosis , 2016, NLP-TEA@COLING.

[10]  Shervin Malmasi,et al.  Large-Scale Native Language Identification with Cross-Corpus Evaluation , 2015, NAACL.

[11]  Jui-Feng Yeh,et al.  Chinese Word Spelling Correction Based on Rule Induction , 2014, CIPS-SIGHAN.

[12]  Zhenghua Li,et al.  Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape , 2014, CIPS-SIGHAN.

[13]  Sylviane Granger,et al.  Contrastive interlanguage analysis: A reappraisal , 2015 .

[14]  Yih-Ru Wang,et al.  Word Order Sensitive Embedding Features/Conditional Random Field-based Chinese Grammatical Error Detection , 2016, NLP-TEA@COLING.

[15]  Lung-Hao Lee,et al.  Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013 , 2013, SIGHAN@IJCNLP.

[16]  Li-Ping Chang,et al.  A retrieval system for interlanguage analysis , 2015, ICCE 2015.

[17]  Yong Wang,et al.  Introduction to NJUPT Chinese Spelling Check Systems in CLP-2014 Bakeoff , 2014, CIPS-SIGHAN.

[18]  Min Liu,et al.  Introduction to BIT Chinese Spelling Correction System at CLP 2014 Bake-off , 2014, CIPS-SIGHAN.

[19]  Yuji Matsumoto,et al.  A Learner Corpus-based Approach to Verb Suggestion for ESL , 2013, ACL.

[20]  Hsin-Hsi Chen,et al.  Introduction to SIGHAN 2015 Bake-off for Chinese Spelling Check , 2015, SIGHAN@IJCNLP.

[21]  Chuan-Jie Lin,et al.  NTOU Chinese Spelling Check System in Sighan-8 Bake-off , 2015, SIGHAN@IJCNLP.

[22]  Chuan-Jie Lin,et al.  NTOU Chinese Spelling Check System in CLP Bake-off 2014 , 2014, CIPS-SIGHAN.

[23]  Yuen-Hsien Tseng,et al.  Overview of SIGHAN 2014 Bake-off for Chinese Spelling Check , 2014, CIPS-SIGHAN.

[24]  Hiroshi Ishikawa,et al.  Improving Chinese Grammatical Error Correction with Corpus Augmentation and Hierarchical Phrase-based Statistical Machine Translation , 2015, NLP-TEA@ACL/IJCNLP.

[25]  David Little The Common European Framework of Reference for Languages: Content, purpose, origin, reception and impact , 2006, Language Teaching.

[26]  Yuen-Hsien Tseng,et al.  Developing learner corpus annotation for Chinese grammatical errors , 2016, 2016 International Conference on Asian Language Processing (IALP).

[27]  Chuan-Jie Lin,et al.  NTOU Chinese Grammar Checker for CGED Shared Task , 2015, NLP-TEA@ACL/IJCNLP.

[28]  Yu-Lin Tsai,et al.  Generating and Scoring Correction Candidates in Chinese Grammatical Error Diagnosis , 2016, NLP-TEA@COLING.

[29]  Marcos Zampieri,et al.  Grammatical Error Detection with Limited Training Data: The Case of Chinese , 2014 .

[30]  Hsueh-Chih Chen,et al.  Introduction to a Proofreading Tool for Chinese Spelling Check Task of SIGHAN-8 , 2015, SIGHAN@IJCNLP.

[31]  Yuen-Hsien Tseng,et al.  Building a TOCFL Learner Corpus for Chinese Grammatical Error Diagnosis , 2018, LREC.

[32]  Yang Xiang,et al.  Chinese Grammatical Error Diagnosis Using Ensemble Learning , 2015, NLP-TEA@ACL/IJCNLP.

[33]  Wanxiang Che,et al.  Chinese Grammatical Error Diagnosis with Long Short-Term Memory Networks , 2016, NLP-TEA@COLING.

[34]  Peijie Huang,et al.  Chinese Grammatical Error Diagnosis System Based on Hybrid Model , 2015, NLP-TEA@ACL/IJCNLP.

[35]  Jianpeng Hou,et al.  HANSpeller: A Unified Framework for Chinese Spelling Correction , 2015, ROCLING/IJCLCLP.

[36]  Benjamin Swanson,et al.  Extracting the Native Language Signal for Second Language Acquisition , 2013, NAACL.

[37]  Mamoru Komachi,et al.  Discriminative Approach to Fill-in-the-Blank Quiz Generation for Language Learners , 2013, ACL.

[38]  Shih-Hung Wu,et al.  CYUT-III System at Chinese Grammatical Error Diagnosis Task , 2016, NLP-TEA@COLING.

[39]  Xuejie Zhang,et al.  Chinese Grammatical Error Diagnosis Using Single Word Embedding , 2016, NLP-TEA@COLING.

[40]  Lung-Hao Lee,et al.  Overview of the NLP-TEA 2015 Shared Task for Chinese Grammatical Error Diagnosis , 2015, NLP-TEA@ACL/IJCNLP.

[41]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[42]  Stephanie Seneff,et al.  Automatic Assessment of Student Translations for Foreign Language Tutoring , 2007, NAACL.

[43]  Tsung-Wei Hsu,et al.  Grammatical Error Detection Based on Machine Learning for Mandarin as Second Language Learning , 2016, NLP-TEA@COLING.

[44]  Tao-Hsing Chang,et al.  KNGED: A tool for grammatical error diagnosis of Chinese sentences , 2014, ICCE 2014.

[45]  Jui-Feng Yeh,et al.  Condition Random Fields-based Grammatical Error Detection for Chinese as Second Language , 2015, NLP-TEA@ACL/IJCNLP.

[46]  Shih-Hung Wu,et al.  Chinese Grammatical Error Diagnosis by Conditional Random Fields , 2015, NLP-TEA@ACL/IJCNLP.

[47]  Yingjie Han,et al.  Automatic Grammatical Error Detection for Chinese based on Conditional Random Field , 2016, NLP-TEA@COLING.