An Automatic Spelling Correction Method for Classical Mongolian

Classical Mongolian is suffering a serious misspelling matter due to its polyphonic alphabet. One Mongolian glyph can map to different letters, i.e., some letters display the same shape. This special encoding scheme makes the words very easy to be misspelled. About half to three quarters of the words are misspellings in the classical Mongolian text, which is mainly caused by confusion between letters of the same shape. Conventional spelling correction techniques cannot solve such errors with correct shapes well for they mostly focus on errors such as character insertion, deletion and transposition. In this work, we propose the intermediate codes to map the words of the same shape into a single shape-based intermediate representation. According to the corresponding shapes, the hybrid approach is then applied to get the correct spellings by integrating rules and neural representation model (context2vec). The experimental results show that this approach achieves the new state-of-the-art performance. In addition, we also develop an efficient and free Mongolian automatic correction system for the text editors.