Predictive text input for Indic scripts

Languages with many letters pose a problem for text entry on reduced keyboards. Using multitap is time consuming as there can be 6-9 characters per key on a mobile phone. For singletap methods more letters per key results in more words per key sequence, i.e. greater ambiguity when selecting which word to present to the user. Todays singletap methods for mobile phones mostly rely on a dictionary and word frequencies, this works remarkably well with the Latin alphabet. But this is not enough when the number of letters per key increases. In this master thesis we investigated different methods to improve the word disambiguation. These methods include word bigrams, part of speech n-grams and keypad remappings. We have chosen the Devanagari script for our implementation as it is one of the scripts with this problem. We have worked with Hindi for the language specific data. We found that a dictionary based solution with word bigrams combined with a remapped keypad layout gave the desired results. The use of these techniques gave an increase in disambiguation accuracy, from 77% to 94%. We also saw an improvement in KSPC, from 1.0856 to 1.0154.

[1]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[2]  Graeme Hirst,et al.  Testing the Efficacy of Part-of-Speech Information in Word Completion , 2003 .

[3]  Tony McEnery,et al.  EMILLE, A 67-Million Word Corpus of Indic Languages: Data Collection, Mark-up and Harmonisation , 2002, LREC.

[4]  Pierre Nugues An Introduction to Language Processing with Perl and Prolog: An Outline of Theories, Implementation, and Application with Special Consideration of English, French, and German , 2006, Cognitive Technologies.

[5]  Renu Gupta TECHNOLOGY FOR INDIC SCRIPTS - A USER PERSPECTIVE , 2006 .

[6]  Jun Gong,et al.  Improved word list ordering for text entry on ambiguous keypads , 2008, NordiCHI.

[7]  Das Gupta,et al.  Learn Hindi yourself , 1981 .

[8]  Cédrick Fairon,et al.  A translated corpus of 30,000 French SMS , 2006, LREC.

[9]  I. Scott MacKenzie,et al.  KSPC (Keystrokes per Character) as a Characteristic of Text Entry Techniques , 2002, Mobile HCI.

[10]  Jianhua Li,et al.  Semantic knowledge in word completion , 2005, Assets '05.

[11]  Jun Gong,et al.  Testing Predictive Text Entry Methods with Constrained Keypad Designs , 2005 .

[12]  Min-Yen Kan Optimizing predictive text entry for short message service on mobile phones 1 , 2005 .

[13]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[14]  Jon Hasselgren,et al.  HMS: A Predictive Text Entry Method Using Bigrams , 2003 .