Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners

Word Ordering Errors (WOEs) are the most frequent type of grammatical errors at sentence level for non-native Chinese language learners. Learners taking Chinese as a foreign language often place character(s) in the wrong places in sentences, and that results in wrong word(s) or ungrammatical sentences. Besides, there are no clear word boundaries in Chinese sentences. That makes WOEs detection and correction more challenging. In this paper, we propose methods to detect and correct WOEs in Chinese sentences. Conditional random fields (CRFs) based WOEs detection models identify the sentence segments containing WOEs. Segment point-wise mutual information (PMI), inter-segment PMI difference, language model, tag of the previous segment, and CRF bigram template are explored. Words in the segments containing WOEs are reordered to generate candidates that may have correct word orderings. Ranking SVM based models rank the candidates and suggests the most proper corrections. Training and testing sets are selected from HSK dynamic composition corpus created by Beijing Language and Culture University. Besides the HSK WOE dataset, Google Chinese Web 5gram corpus is used to learn features for WOEs detection and correction. The best model achieves an accuracy of 0.834 for detecting WOEs in sentence segments. On the average, the correct word orderings are ranked 4.8 among 184.48 candidates.

[1]  Jianfeng Gao,et al.  Using Statistical Techniques and Web Search to Correct ESL Errors , 2013 .

[2]  Lung-Hao Lee,et al.  Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013 , 2013, SIGHAN@IJCNLP.

[3]  Josef van Genabith,et al.  A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors , 2007, EMNLP.

[4]  Robert Dale,et al.  HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task , 2012, BEA@NAACL-HLT.

[5]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[6]  Raymond Hendy Susanto,et al.  The CoNLL-2014 Shared Task on Grammatical Error Correction , 2014 .

[7]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[8]  Shou-De Lin,et al.  Discovering Correction Rules for Auto Editing , 2010, Int. J. Comput. Linguistics Chin. Lang. Process..

[9]  Adam Kilgarriff,et al.  Helping Our Own: The HOO 2011 Pilot Shared Task , 2011, ENLG.

[10]  Hsin-Hsi Chen,et al.  Detecting Word Ordering Errors in Chinese Sentences for Learning Chinese as a Foreign Language , 2012, COLING.

[11]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[12]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[13]  Haizhou Li,et al.  Topological Ordering of Function Words in Hierarchical Phrase-based Translation , 2009, ACL/IJCNLP.

[14]  Claudia Leacock,et al.  Automated Grammatical Error Correction for Language Learners , 2010, COLING.

[15]  Zhao Hai,et al.  Chinese Word Segmentation: A Decade Review , 2007 .

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  John DeNero,et al.  Inducing Sentence Structure from Parallel Corpora for Reordering , 2011, EMNLP.