A Context-Sensitive Approach to Find Optimum Language Model for Automatic Bangla Spelling Correction

Automated spelling correction is an important phenomenon in typing that has intense effect on aiding both literate and semi-literate people while using keyboard or other similar devices. Such automated spelling correction technique also helps students significantly in learning process through applying proper words during word processing. A lot of work has been conducted for English language, but for Bangla, it is still not adequate. All work done so far in Bangla is context-free. Bangla is one of the mostly spoken languages (3.05% of world population) and considered seventh language of all languages in the world. In this paper, we propose a context-sensitive approach for automated spelling correction in Bangla. We make combined use of edit distance and stochastic, i.e. N-gram language model. We use six N-gram models in total. A novel approach is deployed in order to find the optimum language model in terms of performance. In addition, for finding out better performance, a large Bangla corpus of different word types is used. We have achieved a satisfactory and promising accuracy of 87.58%.

[1]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[2]  Yan Zhang,et al.  A Correcting Model Based on Tribayes for Real-Word Errors in English Essays , 2012, 2012 Fifth International Symposium on Computational Intelligence and Design.

[3]  Gonesh Chandra Saha,et al.  Checking the Correctness of Bangla Words using N-Gram , 2014 .

[4]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[6]  Sabri A. Mahmoud,et al.  Context-Sensitive Arabic Spell Checker Using Context Words and N-Gram Language Models , 2013, 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences.

[7]  N. UzZaman,et al.  A Double Metaphone encoding for Bangla and its application in spelling checker , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[8]  Shashi Pal Singh,et al.  Frequency based spell checking and rule based grammar checking , 2016, 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT).

[9]  Md. Zahurul Islam,et al.  A light weight stemmer for Bengali and its use in spelling checker , 2007 .

[10]  Naushad UzZaman,et al.  A comprehensive Bangla spelling checker , 2006 .

[11]  Prianka Mandal,et al.  Clustering-based Bangla spell checker , 2017, 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR).

[12]  Bidyut Baran Chaudhuri,et al.  Reversed word dictionary and phonetically similar word grouping based spell-checker to Bangla text , 2014 .

[13]  Andrew Carlson,et al.  Memory-based context-sensitive spelling correction at web scale , 2007, ICMLA 2007.

[14]  Md. Munshi Asadullah Error-tolerant Finite-state Recognizer and String Pattern Similarity Based Spelling-Checker for Bangla , 2006 .

[15]  Naushad UzZaman,et al.  A Bangla phonetic encoding for better spelling suggesions , 2004 .

[16]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .