A spell checker for a low-resourced and morphologically rich language

Spell checking plays an important role in improving the quality of documents by identifying misspelled words in the document. There are various efforts made towards advancement of spell checkers on other languages such as in English that has almost perfected spell checking system (e.g. Microsoft Word). However, few efforts were made to develop an efficient Filipino spell checker. One major challenge of existing Filipino spell checkers, being dictionary-based, is the lack of a complete dictionary to capture all inflected forms (e.g. isinasama ‘including’, isasama ‘will be included’, and isinama ‘included’ with the base form sama ‘include’), borrowing (e.g. magtex ‘to text’ and nagtex ‘texted’), and code-switching (e.g. magtext ‘to text’, and nag-text ‘texted’ with the base form ‘text’) of a word. In addition, existing systems cannot handle code-switching wherein valid words are being marked as erroneous. In this research, a spell checking is designed for Filipino-low-resourced morphologically rich language. It detects and corrects typographical errors in the language and introduces a modified version of metaphone algorithm for ranking the candidate suggestions. The system results to 81% recall, 53.64% precision, 64.53% f-measure, and 87.78% suggestion adequacy on 100 sentences taken from exercise documents of Filipino students.

[1]  Heshaam Faili,et al.  Vafa spell-checker for detecting spelling, grammatical, and real-word errors of Persian language , 2016, Digit. Scholarsh. Humanit..

[2]  Chunheng Wang,et al.  A Chinese OCR spelling check approach based on statistical language models , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[3]  Menno van Zaanen,et al.  Improving a Spelling Checker for Afrikaans , 2002, CLIN.

[4]  Charibeth Cheng,et al.  SpellCheF: Spelling Checker and Corrector for Filipino , 2008 .

[5]  Rasha Al-tarawneh,et al.  Towards Arabic Spell-Checker Based on N-Grams Scores , 2012 .

[6]  Navjot Kaur,et al.  A Survey of Spelling Error Detection and Correction Techniques , 2013 .

[7]  L. Philips,et al.  Hanging on the metaphone , 1990 .

[8]  Klaus U. Schulz,et al.  Fast string correction with Levenshtein automata , 2002, International Journal on Document Analysis and Recognition.

[9]  Mohsen Sharifi,et al.  A novel string distance metric for ranking Persian respelling suggestions , 2012, Natural Language Engineering.

[10]  K. Ahmad,et al.  ERROR DETECTION AND CORRECTION IN NLP USING FINITE STATE AUTOMATA: URDU TEXT PROCESSING , 2014 .

[11]  Y. Cebi,et al.  Turkish spelling error detection and correction by using word n-grams , 2009, 2009 Fifth International Conference on Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control.

[12]  Roger Mitton Ordering the suggestions of a spellchecker without using context , 2009, Nat. Lang. Eng..

[13]  Lei Huang,et al.  Chinese Spelling Check System Based on N-gram Model , 2015, SIGHAN@IJCNLP.

[14]  Don Erick J. Bonus A stemming algorithm for Tagalog words , 2003 .

[15]  Michael Flor,et al.  On using context for automatic correction of non-word misspellings in student essays , 2012, BEA@NAACL-HLT.

[16]  Mário J. Silva,et al.  Spelling Correction for Search Engine Queries , 2004, EsTAL.

[17]  Marcos Zampieri,et al.  Effective Spell Checking Methods Using Clustering Algorithms , 2013, RANLP.