Supporting Collaborative Transcription of Recorded Speech with a 3D Game Interface

The amount of speech data available on-line and in institutional repositories, including recordings of lectures, "podcasts", news broadcasts etc, has increased greatly in the past few years. Effective access to such data demands transcription. While current automatic speech recognition technology can help with this task, results of automatic transcription alone are often unsatisfactory. Recently, approaches which combine automatic speech recognition and collaborative transcription have been proposed in which geographically distributed users edit and correct automatically generated transcripts. These approaches, however, are based on traditional text-editor interfaces which provide little satisfaction to the users who perform these time-consuming tasks, most often on a voluntarily basis. We present a 3D "transcription game" interface which aims at improving the user experience of the transcription task and, ultimately, creating an extra incentive for users to engage in a process of collaborative transcription in the first place.

[1]  Sébastien Paquet,et al.  Translation the Wiki way , 2006, WikiSym '06.

[2]  Vaibhava Goel,et al.  LVCSR rescoring with modified loss functions: a decision theoretic perspective , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[4]  Masood Masoodian,et al.  Interface design strategies for computer-assisted speech transcription , 2008, OZCHI '08.

[5]  Gerald Penn,et al.  Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts , 2008, CHI.

[6]  William A. Ainsworth,et al.  Feedback Strategies for Error Correction in Speech Recognition Systems , 1992, Int. J. Man Mach. Stud..

[7]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[8]  Markus Krötzsch,et al.  Semantic Wikipedia , 2006, WikiSym '06.

[9]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[10]  Masataka Goto,et al.  PodCastle: a spoken document retrieval system for podcasts and its performance improvement by anonymous user contributions , 2009, SSCS '09.

[11]  Tatsuya Kawahara,et al.  Towards an efficient archive of spontaneous speech: Design of computer‐assisted speech transcription system , 2006 .

[12]  Deb Roy,et al.  Fast transcription of unstructured audio recordings , 2009, INTERSPEECH.

[13]  Hermann Ney,et al.  Explicit word error minimization using word hypothesis posterior probabilities , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[15]  Peng Yu,et al.  Towards Spoken-Document Retrieval for the Internet: Lattice Indexing For Large-Scale Web-Search Architectures , 2006, NAACL.

[16]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[17]  Alex Acero,et al.  Soft indexing of speech content for search in spoken documents , 2007, Comput. Speech Lang..