HANSpeller: A Unified Framework for Chinese Spelling Correction

The number of people learning Chinese as a Foreign Language (CFL) has been booming in recent decades. The problem of spelling error correction for CFL learners increasingly is becoming important. Compared to the regular text spelling check task, more error types need to be considered in CFL cases. In this paper, we propose a unified framework for Chinese spelling correction. Instead of conventional methods, which focus on rules or statistics separately, our approach is based on extended HMM and ranker-based models, together with a rule-based model for further polishing, and a final decision-making step is adopted to decide whether to output the corrections or not. Experimental results on the test data of foreigner's Chinese essays provided by the SIGHAN 2014 bake-off illustrate the performance of our approach.

[1]  Eric Atwell,et al.  Dealing with ill-formed English text , 1987 .

[2]  Mei-Chen Wu,et al.  Error Detection and Correction Based on Chinese Phonemic Alphabet in Chinese Text , 2007, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  Peng Jin,et al.  Integrating Pinyin to Improve Spelling Errors Detection for Chinese Language , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[4]  Xueqi Cheng,et al.  HANSpeller++: A Unified Framework for Chinese Spelling Correction , 2015 .

[5]  Peter Willett,et al.  Automatic Spelling Correction Using a Trigram Similarity Measure , 1983, Inf. Process. Manag..

[6]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[7]  Yong Wang,et al.  Introduction to NJUPT Chinese Spelling Check Systems in CLP-2014 Bakeoff , 2014, CIPS-SIGHAN.

[8]  Robert L. Mercer,et al.  Context based spelling correction , 1991, Inf. Process. Manag..

[9]  C.-Y. Lee,et al.  Visually and Phonologically Similar Characters in Incorrect Chinese Words: Analyses, Identification, and Applications , 2011, TALIP.

[10]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[11]  Jianfeng Gao,et al.  A Unified Approach to Transliteration-based Text Input with Online Spelling Correction , 2012, EMNLP.

[12]  Lung-Hao Lee,et al.  Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013 , 2013, SIGHAN@IJCNLP.

[13]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14]  Jason S. Chang,et al.  機器翻譯為本的中文拼字改錯系統 (Chinese Spelling Checker Based on Statistical Machine Translation) , 2013, ROCLING.

[15]  Maosong Sun,et al.  CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method , 2011, IJCAI.

[16]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[17]  Yuen-Hsien Tseng,et al.  Overview of SIGHAN 2014 Bake-off for Chinese Spelling Check , 2014, CIPS-SIGHAN.

[18]  Kaile Su,et al.  Automated Error Detection and Correction of Chinese Characters in Written Essays Based on Weighted Finite-State Transducer , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[19]  Eric Brill,et al.  Automatic Rule Acquisition for Spelling Correction , 1997, ICML.

[20]  Chung-Hsien Wu,et al.  Error Diagnosis of Chinese Sentences Using Inductive Learning Algorithm and Decomposition-Based Testing Mechanism , 2012, TALIP.

[21]  Chuan-Jie Lin,et al.  NTOU Chinese Spelling Check System in CLP Bake-off 2014 , 2014, CIPS-SIGHAN.

[22]  Xu Sun,et al.  A Large Scale Ranker-Based System for Search Query Spelling Correction , 2010, COLING.

[23]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[24]  Lei Zhang,et al.  Automatic Detecting/Correcting Errors in Chinese Text by an Approximate Word-Matching Algorithm , 2000, ACL.

[25]  Hai Zhao,et al.  Graph Model for Chinese Spell Checking , 2013, SIGHAN@IJCNLP.

[26]  Keh-Jiann Chen,et al.  Introduction to CKIP Chinese Spelling Check System for SIGHAN Bakeoff 2013 Evaluation , 2013, SIGHAN@IJCNLP.

[27]  Farooq Ahmad,et al.  Learning a Spelling Error Model from Search Query Logs , 2005, HLT.

[28]  Yuanzhuo Wang,et al.  Extended HMM and Ranking Models for Chinese Spelling Correction , 2014, CIPS-SIGHAN.

[29]  Yang Zhang,et al.  Exploring Distributional Similarity Based Models for Query Spelling Correction , 2006, ACL.