Extended HMM and Ranking Models for Chinese Spelling Correction

Spelling correction has been studied for many decades, which can be classified into two categories: (1) regular text spelling correction, (2) query spelling correction. Although the two tasks share many common techniques, they have different concerns. This paper presents our work on the CLP-2014 bake-off. The task focuses on spelling checking on foreigner Chinese essays. Compared to online search query spelling checking task, more complicated techniques can be applied for better performance. Therefore, we proposed a unified framework for Chinese essays spelling correction based on extended HMM and ranker-based models, together with a rule-based model for further polishing. Our system showed better performance on the test dataset.