Asynchronous Multimodal Text Entry Using Speech and Gesture Keyboards

We propose reducing errors in text entry by combining speech and gesture keyboard input. We describe a merge model that combines recognition results in an asynchronous and flexible manner. We collected speech and gesture data of users entering both short email sentences and web search queries. By merging recognition results from both modalities, word error rate was reduced by 53% relative for email sentences and 29% relative for web searches. For email utterances with speech errors, we investigated providing gesture keyboard corrections of only the erroneous words. Without the user explicitly indicating the incorrect words, our model was able to reduce the word error rate by 44% relative. Index Terms: mobile text entry, multimodal interfaces

[1]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[2]  R. J. Lickley,et al.  Proceedings of the International Conference on Spoken Language Processing. , 1992 .

[3]  Per Ola Kristensson,et al.  Recognition and correction of voice web search queries , 2009, INTERSPEECH.

[4]  Joshua Goodman,et al.  Language modeling for soft keyboards , 2002, IUI '02.

[5]  Per Ola Kristensson,et al.  Getting it right the second time: Recognition of spoken corrections , 2010, 2010 IEEE Spoken Language Technology Workshop.

[6]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[7]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[8]  Per Ola Kristensson,et al.  Automatic selection of recognition errors by respeaking the intended text , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[9]  Khe Chai Sim Haptic Voice Recognition: Augmenting speech modality with touch events for efficient speech recognition , 2010, 2010 IEEE Spoken Language Technology Workshop.

[10]  Per Ola Kristensson,et al.  A versatile dataset for text entry evaluations based on genuine mobile emails , 2011, Mobile HCI.

[11]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[12]  Shumin Zhai,et al.  SHARK2: a large vocabulary shorthand writing system for pen-based computers , 2004, UIST '04.

[13]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .