A crucial figure of merit for a spelling checker is not just whether it can detect misspelled words, but also in how it ranks the suggestions for the word. Spelling checker algorithms using editdistance methods tend to produce a large number of possibilities for misspelled words. We propose an alternative approach to checking the spelling of Bangla text that uses a finite state automaton (FSA) to probabilistically create the suggestion list for a misspelled word. FSA has proven to be an effective method for problems requiring probabilistic solution and high error tolerance. We start by using a finite state representation for all the words in the Bangla dictionary; the algorithm then uses the state tables to test a string, and in case of an erroneous string, try find all possible solutions by attempting singular and multi-step transitions to consume one or more characters and using the subsequent characters as look-ahead; and finally, we use backtracking to add each possible solution to the suggestion list. The use of finite state representation for the word implies hat the algorithm is much more efficient in the case of noninflected forms; in case of nouns, it is even more significant as Bangla nouns are heavily used in the non-inflected form. In terms of error detection and correction, the algorithm uses the statistics of Bangla error pattern and thus produces a small number of significant suggestions. One notable limitation is the inability to handle transposition errors as a single edit distance errors. This is not as significant as it may seem since the number of transposition errors are not as common as other errors in Bangla. This paper presents the structure and the algorithm to implement a practical Bangla spell-checker, and discusses the results obtained from the prototype implementation.
[1]
B. Chaudhuri,et al.
Error pattern in Bangla text
,
1999
.
[2]
Naushad UzZaman,et al.
A comprehensive Bangla spelling checker
,
2006
.
[3]
N. UzZaman,et al.
A Double Metaphone encoding for Bangla and its application in spelling checker
,
2005,
2005 International Conference on Natural Language Processing and Knowledge Engineering.
[4]
Naushad UzZaman,et al.
A Bangla phonetic encoding for better spelling suggesions
,
2004
.
[5]
Karen Kukich,et al.
Techniques for automatically correcting words in text
,
1992,
CSUR.
[6]
Kemal Oflazer,et al.
Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction
,
1995,
CL.
[7]
OflazerKemal.
Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction
,
1996
.
[8]
Sebastian Deorowicz,et al.
Correcting Spelling Errors by Modelling Their Causes
,
2005
.