Chinese Spell Checking Based on Noisy Channel Model
暂无分享,去创建一个
Chinese spell checking is an important component of many Chinese NLP applications, including word processors, search engines, and automatic essay rating. Compared to English, Chinese has no word boundaries, and there are various Chinese input methods that cause different kinds of typos. Therefore, it is more difficult to develop a spell checker for Chinese. In this paper, we introduce a novel method for correcting Chinese errors based on sound or shape similarity. In our approach, potential typos in a given sentence are then corrected using a channel model and a character-based language model in the noisy channel model. In the training phase, we estimate the channel probabilities for each character based on ngrams in Web corpus. At run-time, the system generates correction candidates for each character in the given sentence and selects the appropriate correction using the channel model and the language model. The experimental results show that the proposed method achieves significantly better accuracy and recall than more complicated methods in the previous work.
[1] Andreas Stolcke,et al. SRILM at Sixteen: Update and Outlook , 2011 .
[2] Lung-Hao Lee,et al. Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013 , 2013, SIGHAN@IJCNLP.
[3] Jason S. Chang,et al. 機器翻譯為本的中文拼字改錯系統 (Chinese Spelling Checker Based on Statistical Machine Translation) , 2013, ROCLING.
[4] C.-Y. Lee,et al. Visually and Phonologically Similar Characters in Incorrect Chinese Words: Analyses, Identification, and Applications , 2011, TALIP.