Reversed word dictionary and phonetically similar word grouping based spell-checker to Bangla text

A new novel technique of localisation and correction of non-word error is described. The technique works in two stages. The first stage takes care of phonetic similarity error. For that the phonetically similar characters are mapped into single units of character code. A new dictionary Dc is constructed with this reduced set of alphabet. A phonetically similar but wrongly spelt word can be easily corrected using this dictionary. The second stage takes care of errors other than phonetic similarity. Here wrongly spelt word S of n characters is searched in the dictionary Dc. If S is a non-word, its first k1 ≤ n characters will match with a valid word in Dc. (if k1 = n then the word in Dc must be longer than n). A reversed word dictionary Dr is also generated where the characters of the word are maintained in a reversed order. If the last k2 characters of S match with a word in Dr then, for single error, it is located within the intersection region of first k1 + 1 and last k2 +1 characters of S. We observed that this region is very small compared to word length for most cases and the number of suggested correct words can be drastically reduced using this information. We have used our approach in correcting Bangla text, where the problem of inflection is cleverly tackled.

[1]  Bidyut Baran Chaudhuri,et al.  A Morpho-Syntactic Analysis Based Lexical Subsystem , 1993, Int. J. Pattern Recognit. Artif. Intell..

[2]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Kenneth Ward Church,et al.  Probability scoring for spelling correction , 1991 .

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Peter Willett,et al.  Automatic Spelling Correction Using a Trigram Similarity Measure , 1983, Inf. Process. Manag..

[6]  Vladimir Cherkassky,et al.  Back-propagation networks for spelling correction , 2015 .

[7]  Bidyut B. Chaudhuri,et al.  Computer recognition of printed Bangla script , 1995 .

[8]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[9]  B. Chaudhuri,et al.  Error pattern in Bangla text , 1999 .

[10]  Godfried T. Toussaint,et al.  Experiments in Text Recognition with the Modified Viterbi Algorithm , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[12]  Bidyut Baran Chaudhuri,et al.  Projection of Multi-Worded Lexical Entities in An Inflectional Language , 1995, Int. J. Pattern Recognit. Artif. Intell..

[13]  Antonio Zamora,et al.  Automatic spelling correction in scientific and scholarly text , 1984, CACM.