Error Correction Using Long Context Match for Smartphone Speech Recognition

Most error correction interfaces for speech recognition applications on smartphones require the user to first mark an error region and choose the correct word from a candidate list. We propose a simple multimodal interface to make the process more efficient. We develop Long Context Match (LCM) to get candidates that complement the conventional word confusion network (WCN). Assuming that not only the preceding words but also the succeeding words of the error region are validated by users, we use such contexts to search higher-order n-grams corpora for matching word sequences. For this purpose, we also utilize the Web text data. Furthermore, we propose a combination of LCM and WCN (“LCM + WCN”) to provide users with candidate lists that are more relevant than those yielded by WCN alone. We compare our interface with the WCNbased interface on the Corpus of Spontaneous Japanese (CSJ). Our proposed “LCM +WCN” method improved the 1-best accuracy by 23%, improved the Mean Reciprocal Rank (MRR) by 28%, and our interface reduced the user’s load by 12%. key words: speech recognition, error correction, multimodal interface, word confusion network, context match

[1]  Koichi Shinoda,et al.  An efficient error correction interface for speech recognition on mobile touchscreen devices , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[2]  Koichi Shinoda,et al.  Simple gesture-based error correction interface for smartphone speech recognition , 2014, INTERSPEECH.

[3]  Tatsuya Kawahara,et al.  Incorporating semantic information to selection of web texts for language model of spoken dialogue system , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Petr Motlícek,et al.  Supervised and unsupervised Web-based language model domain adaptation , 2012, INTERSPEECH.

[5]  Sungjin Lee,et al.  Seamless error correction interface for voice word processor , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Andreas Stolcke,et al.  SRILM at Sixteen: Update and Outlook , 2011 .

[7]  Paul Deléglise,et al.  Computer-assisted transcription of speech based on confusion network reordering , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Kenji Iwata,et al.  Semi-synchronous speech and pen input for mobile user interfaces , 2011, Speech Commun..

[9]  Per Ola Kristensson,et al.  Intelligently Aiding Human-Guided Correction of Speech Recognition , 2010, AAAI.

[10]  Per Ola Kristensson,et al.  Five Challenges for Intelligent Text Entry Methods , 2009, AI Mag..

[11]  Per Ola Kristensson,et al.  Parakeet: a continuous speech recognition system for mobile touch-screen devices , 2009, IUI.

[12]  Frank K. Soong,et al.  Efficient handwriting correction of speech recognition errors with template constrained posterior (TCP) , 2008, INTERSPEECH.

[13]  Alexander I. Rudnicky,et al.  Interactive ASR Error Correction for Touchscreen Devices , 2008, ACL.

[14]  Sadaoki Furui,et al.  The Titech large vocabulary WFST speech recognition system , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[15]  Francisco Casacuberta,et al.  Computer Assisted Transcription of Speech , 2007, IbPRIA.

[16]  Hiromitsu Nishizaki,et al.  Word Error Correction of Continuous Speech Recognition Using WEB Documents for Spoken Document Indexing , 2006, ICCPOL.

[17]  Gökhan Tür,et al.  Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..

[18]  Brian Amento,et al.  Error correction of voicemail transcripts in SCANMail , 2006, CHI.

[19]  Lou Boves,et al.  Effective error recovery strategies for multimodal form-filling applications , 2005, Speech Commun..

[20]  Francisco Casacuberta,et al.  A Syntactic Pattern Recognition Approach to Computer Assisted Translation , 2004, SSPR/SPR.

[21]  Ronald Rosenfeld,et al.  Improving trigram language modeling with the World Wide Web , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[22]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[23]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[24]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.

[25]  James R. Lewis,et al.  Effect of Error Correction Strategy on Speech Dictation Throughput , 1999 .

[26]  Chris Baber,et al.  Modelling Error Recovery and Repair in Automatic Speech Recognition , 1993, Int. J. Man Mach. Stud..

[27]  Ngoc Thang Vu,et al.  Unsupervised language model adaptation for automatic speech recognition of broadcast news using web 2.0 , 2013, INTERSPEECH.

[28]  Mark Dredze,et al.  A spoken term detection framework for recovering out-of-vocabulary words using the web , 2010, INTERSPEECH.

[29]  Keith Vertanen,et al.  Speech and speech recognition during dictation corrections , 2006, INTERSPEECH.

[30]  Masataka Goto,et al.  Speech repair: quick error correction just by using selection operation for speech input interfaces , 2005, INTERSPEECH.

[31]  Gregory D. Abowd,et al.  Error Correction Techniques for Handwriting, Speech, and Other Ambiguous or Error Prone Systems , 1999 .

[32]  M. Thomason Interactive Pattern Recognition , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.