Phonetic-based Sindhi spellchecker system using a hybrid model

This article presents a novel architecture using a hybrid model for developing a Sindhi spellchecker system which has yet not been developed prior to this work. The compound textual forms and glyphs of Sindhi language presents a substantial challenge for developing a Sindhi spellchecker system and generating a similar suggestion list for misspelled words. In order to implement such a system, phonetic-based Sindhi language rules and patterns must be taken into account for increasing the accuracy and efficiency. In this research work, a simple and efficient combinational hybrid system is proposed, using three different algorithms, the Edit Distance algorithm to find the measure of similarity between two Sindhi strings. The phonetic-based SoundEx and ShapeEx algorithms are developed for pattern or glyph matching, generating accurate and an efficient suggestion list for incorrect or misspelled Sindhi words. The proposed system is established with a blend between Phonetic-based SoundEx algorithm and ShapeEx algorithm for pattern or glyph matching, generating accurate and efficient suggestion list for incorrect or misspelled Sindhi words. In this article, a table of phonetically similar-sounding Sindhi characters is presented which are grouped together along with another table containing similar glyph or shape-based character groups. The system has been successfully integrated into a pre-developed Sindhi word processer application. The Sindhi word segmentation methodology and algorithms required for the spellchecker has already been published and so are not discussed in detail in this article.

[1]  Sarmad Hussain,et al.  A novel approach for ranking spelling error corrections for Urdu , 2007, Lang. Resour. Evaluation.

[2]  Abdullah Zawawi Talib,et al.  ISSUES AND CHALLENGES IN SINDHI OCR , 2014 .

[3]  Ilya Segalovich,et al.  A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine , 2003, MLMTA.

[4]  Asad Ali Shaikh,et al.  Spelling Error Trends and Patterns in Sindhi , 2014, ArXiv.

[5]  Nedjma Djouhra Ousidhoum,et al.  Towards the Refinement of the Arabic Soundex , 2013, NLDB.

[6]  Z. Zemirli,et al.  An effective model of stressing in an Arabic Text To Speech System , 2007, 2007 IEEE/ACS International Conference on Computer Systems and Applications.

[7]  Victoria J. Hodge,et al.  An Evaluation of Phonetic Spell Checkers , 2001 .

[8]  Justin Zobel,et al.  Finding approximate matches in large lexicons , 1995, Softw. Pract. Exp..

[9]  Josef van Genabith,et al.  Arabic Word Generation and Modelling for Spell Checking , 2012, LREC.

[10]  David A. Hull Stemming algorithms: a case study for detailed evaluation , 1996 .

[11]  Azhar Ali Shah,et al.  Bi-Lingual Text to Speech Synthesis System for Urdu and Sindhi , 2004 .

[12]  Keith Webster TEXT IN ENGLISH , 2000 .

[13]  Khaled Shaalan,et al.  Towards automatic spell checking for Arabic , 2003 .

[14]  Birger Andersson,et al.  Natural Language Processing and Information Systems , 2003, Lecture Notes in Computer Science.

[15]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[16]  Jacobus Hendricus van Lint,et al.  Mechanisms of Radiation Effects in Electronic Materials (Volume 1) , 1980 .

[17]  Zeeshan Bhatti,et al.  Phonetic based SoundEx & ShapeEx algorithm for Sindhi Spell Checker System , 2014, ArXiv.

[18]  James Allan,et al.  Using Soundex Codes for Indexing Names in ASR Documents , 2004, HLT-NAACL 2004.

[19]  Moustafa Elshafei,et al.  Techniques for high quality Arabic speech synthesis , 2002, Inf. Sci..

[20]  R. P. Egorova Sindhi Language , 1971 .

[21]  Zeeshan Bhatti,et al.  Word Segmentation Model for Sindhi Text , 2014 .

[22]  Lawrence Philips,et al.  The double metaphone search algorithm , 2000 .

[23]  Walt Detmar Meurers,et al.  Encyclopedia of Language and Linguistics , 2006 .

[24]  Patrick Blackburn,et al.  A Logical Approach to Arabic Phonology , 1991, EACL.

[25]  James L. Peterson,et al.  Computer programs for detecting and correcting spelling errors , 1980, CACM.

[26]  S.N. Nawaz,et al.  An approach to offline Arabic character recognition using neural networks , 2003, 10th IEEE International Conference on Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003.

[27]  Tayebeh Mosavi Miangah FarsiSpell: A spell-checking system for Persian using a large monolingual corpus , 2014, Lit. Linguistic Comput..

[28]  T. N. Gadd,et al.  PHOENIX: the algorithm , 1990 .

[29]  N. UzZaman,et al.  A Double Metaphone encoding for Bangla and its application in spelling checker , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[30]  Zeeshan Bhatti,et al.  Towards a Generic Framework for the Development of Unicode Based Digital Sindhi Dictionaries , 2014, ArXiv.

[31]  L. Philips,et al.  Hanging on the metaphone , 1990 .