The filtered combination of the weighted edit distance and the Jaro-Winkler distance to improve spellchecking Arabic texts

The digital environments for human learning have been much evolving thanks to the incredible progress of information technologies. This is particularly the case for automatic correction of spelling errors requested by a large majority of people nowadays. Almost all of the current spellcheckers are semiautomatic, and they enable users to find the good solution for a committed error. The major shortcoming of the existing metric methods of correction lies in the bad scheduling of the solutions suggested to the spellchecking out of context of a detected error. To overcome this limitation, we have developed several approaches which suggest probability costs estimated from a learning test. It is attributed in various editing operations during calculating measure of similarity, case of the edit distance. The idea developed in this work was to know how to efficiently weigh these editing operations without resorting to a phase of learning. This is based only on the proximity and the similarity between Arabic keyboard keys. Additionally, we have suggested combining this measure with the distance of Jaro-Winkler in order to better filter, refine and weigh certain solutions compared to others. The experimental results stem from tests conducted on errors committed in a learning corpus, trying to validate the choices of conception and to prove the interest of both approaches.

[1]  Hicham Gueddah,et al.  The impact of arabic inter-character proximity and similarity on spell-checking , 2013, 2013 8th International Conference on Intelligent Systems: Theories and Applications (SITA).

[2]  Agata Savary Recensement et description des mots composés - méthodes et applications , 2000 .

[3]  Gueddah Hicham Introduction of the weight edition errors in the Levenshtein distance , 2012 .

[4]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[5]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[6]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[7]  Anne Vandeventer Faltin,et al.  Correcteur orthographique adapté à l'apprentissage du français , 2004 .

[8]  Roger Mitton Ordering the suggestions of a spellchecker without using context , 2009, Nat. Lang. Eng..

[9]  Antonio Zamora,et al.  Automatic spelling correction in scientific and scholarly text , 1984, CACM.

[10]  Kemal Oflazer,et al.  Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction , 1995, CL.

[11]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[12]  Hélène Richy,et al.  Integration d'un correcteur orthographique dans l'editeur structure GRIF , 1991 .

[13]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.