Clustering-based Bangla spell checker

Detecting spelling errors and correcting those errors automatically is a great research challenge. Developing a precise spell checker for Bangla language which detects spelling errors and provides suggestions for correcting those errors, is quite difficult because of the complex rules of Bangla spelling. In this paper, a clustering-based spell checking technique is proposed for Bangla language that reduces both search space and search time. Therefore, it improves the performance of a spell checker. The proposed spell checker can handle both typographical errors and phonetic errors. To evaluate the proposed spell checking technique, we use 2,450 misspelled words and the result shows that the proposed approach performs better for checking and correcting spelling errors. The success rate of proposed spell checker is 99.8%. We compare our spell checking technique with two Bangla spell checkers, Avro and Puspa and the proposed system provides relatively better results.

[1]  Chandranath Adak AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION , 2013 .

[2]  Lawrence Philips,et al.  The double metaphone search algorithm , 2000 .

[3]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[4]  Md. Zahurul Islam,et al.  A light weight stemmer for Bengali and its use in spelling checker , 2007 .

[5]  Andreas Nürnberger,et al.  Revised N-Gram based Automatic Spelling Correction Tool to Improve Retrieval Effectiveness , 2009, Polibits.

[6]  Md. Munshi Asadullah Error-tolerant Finite-state Recognizer and String Pattern Similarity Based Spelling-Checker for Bangla , 2006 .

[7]  Naushad UzZaman,et al.  A Bangla phonetic encoding for better spelling suggesions , 2004 .

[8]  Md. Nawab Yousuf Ali,et al.  Morphological analysis of Bangla words for Universal Networking Language , 2008, 2008 Third International Conference on Digital Information Management.

[9]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[10]  B. Chaudhuri,et al.  Error pattern in Bangla text , 1999 .

[11]  Md. Mokhlesur Rahman,et al.  Automated Word Prediction in Bangla Language Using Stochastic Language Models , 2016, ArXiv.

[12]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[13]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14]  Marcos Zampieri,et al.  Effective Spell Checking Methods Using Clustering Algorithms , 2013, RANLP.

[15]  N. UzZaman,et al.  A Double Metaphone encoding for Bangla and its application in spelling checker , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[16]  L. Philips,et al.  Hanging on the metaphone , 1990 .

[17]  Bidyut Baran Chaudhuri,et al.  Reversed word dictionary and phonetically similar word grouping based spell-checker to Bangla text , 2014 .

[18]  Gonesh Chandra Saha,et al.  Checking the Correctness of Bangla Words using N-Gram , 2014 .

[19]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[20]  Marcos Zampieri,et al.  Between Sound and Spelling: Combining Phonetics and Clustering Algorithms to Improve Target Word Recovery , 2014, PolTAL.

[21]  Naushad UzZaman,et al.  A comprehensive Bangla spelling checker , 2006 .

[22]  T. N. Gadd,et al.  PHOENIX: the algorithm , 1990 .