Error correction of voicemail transcripts in SCANMail

Despite its widespread use, voicemail presents numerous usability challenges: People must listen to messages in their entirety, they cannot search by keywords, and audio files do not naturally support visual skimming. SCANMail overcomes these flaws by automatically generating text transcripts of voicemail messages and presenting them in an email-like interface. Transcripts facilitate quick browsing and permanent archive. However, errors from the automatic speech recognition (ASR) hinder the usefulness of the transcripts. The work presented here specifically addresses these problems by evaluating user-initiated error correction of transcripts. User studies of two editor interfaces-a grammar-assisted menu and simple replacement by typing-reveal reduced audio playback times and an emphasis on editing important words with the menu, suggesting its value in mobile environments where limited input capabilities are the norm and user privacy is essential. The study also adds to the scarce body of work on ASR confidence shading, suggesting that shading may be more helpful than previously reported.

[1]  Daniel B. Horn,et al.  Patterns of entry and correction in large vocabulary continuous speech recognition systems , 1999, CHI '99.

[2]  Barry Arons,et al.  The audio notebook: paper and pen interaction with structured speech , 2001, CHI.

[3]  Rick Kazman,et al.  Four Paradigms for Indexing Video Conferences , 1996, IEEE Multim..

[4]  Xrrox Pakc SEGMENTATION OF SPEECH USING SPEAKER IDENTIFICATION , 1994 .

[5]  Barry Arons,et al.  SpeechSkimmer: a system for interactively skimming recorded speech , 1997, TCHI.

[6]  Timothy J. Hazen,et al.  Recognition Confidence Scoring for Use in Speech Understanding Systems , 2000 .

[7]  Lin Lawrence Chase,et al.  Word and acoustic confidence annotation for large vocabulary speech recognition , 1997, EUROSPEECH.

[8]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[9]  Alexander G. Hauptmann,et al.  Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[10]  Aaron E. Rosenberg,et al.  SCANMail: a voicemail interface that makes speech browsable, readable and searchable , 2002, CHI.

[11]  Richard Mander,et al.  Working with audio: integrating personal tape recorders and desktop computers , 1992, CHI '92.

[12]  Steve Whittaker Seeing What You Are Hearing: Coordinating Responses to Trouble Reports in Network Troubleshooting , 2003, ECSCW.

[13]  Walter Bender,et al.  Improving speech playback using time-compression and speech recognition , 2004, CHI.

[14]  Andrew Sears,et al.  Using confidence scores to improve hands-free speech based navigation in continuous dictation systems , 2004, TCHI.

[15]  Shingo Uchihashi,et al.  An interactive comic book presentation for exploring video , 2000, CHI.

[16]  Matt Jones,et al.  SCANMail: Audio Navigation in the Voicemail Domain , 2001, HLT.

[17]  Roger K. Moore Computer Speech and Language , 1986 .

[18]  Steve Whittaker,et al.  Semantic speech editing , 2004, CHI.

[19]  Joseph Polifroni,et al.  Recognition confidence scoring and its use in speech understanding systems , 2002, Comput. Speech Lang..

[20]  Debby Hindus,et al.  Capturing, structuring, and representing ubiquitous audio , 1993, TOIS.

[21]  Gökhan Tür,et al.  Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..

[22]  Leysia Palen,et al.  “I'll get that off the audio”: a case study of salvaging multimedia meeting records , 1997, CHI.

[23]  Julia Hirschberg,et al.  Jotmail: a voicemail interface that enables you to see what was said , 2000, CHI.

[24]  Julia Hirschberg,et al.  ASR satisficing: the effects of ASR accuracy on speech retrieval , 2000, INTERSPEECH.

[25]  Sharon L. Oviatt,et al.  Taming recognition errors with a multimodal interface , 2000, CACM.