German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings

Since recording technology has become more robust and easier to use, more and more universities are taking the opportunity to record their lectures and put them on the Web in order to make them accessable by students. The automatic speech recognition (ASR) techniques provide a valueable source for indexing and retrieval of lecture video materials. In this paper, we evaluate the state-of-the-art speech recognition software to find a solution for the automatic transcription of German lecture videos. Our experimental results show that the word error rates (WERs) was reduced by 12.8% when the speech training corpus of a lecturer is increased by 1.6 hours.

[1]  Gary Geunbae Lee,et al.  A Korean Spoken Document Retrieval System for Lecture Search , 2008 .

[2]  Wolfgang Hürst,et al.  A Qualitative Study Towards Using Large Vocabulary Automatic Speech Recognition to Index Recorded Presentations for Search and Access over the Web , 2002, ICWI.

[3]  Richard J. Anderson,et al.  Speech, ink, and slides: the interaction of content channels , 2004, MULTIMEDIA '04.

[4]  Wenli Zhou,et al.  A Comparison between HTK and SPHINX on Chinese Mandarin , 2009, 2009 International Joint Conference on Artificial Intelligence.

[5]  Gerald Penn,et al.  Automatic speech recognition for webcasts: how good is good enough and what to do when it isn't , 2006, ICMI '06.

[6]  Mauro Cettolo,et al.  Language modeling and transcription of the TED corpus lectures , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Lori Lamel,et al.  Developments in large vocabulary, continuous speech recognition of German , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Elsie Fogerty Speech , 1933, Encyclopedia of Evolutionary Psychological Science.

[9]  António Teixeira,et al.  Language Models in Automatic Speech Recognition , 2005 .

[10]  Keith Vertanen Baseline Wsj Acoustic Models for Htk and Sphinx : Training Recipes and Recognition Experiments , 2007 .

[11]  Harald Sack,et al.  Automated Annotation of Synchronized Multimedia Presentations , 2006 .

[12]  Christoph Meinel,et al.  Semantic indexing for recorded educational lecture videos , 2006, Fourth Annual IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOMW'06).

[13]  James R. Glass,et al.  Analysis and Processing of Lecture Audio Data: Preliminary Investigations , 2004, Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004 - SpeechIR '04.

[14]  Jürgen Riedler,et al.  Fitting German into N-Gram Language Models , 2002, TSD.

[15]  Christoph Meinel,et al.  tele-TASK: teleteaching anywhere solution kit , 2002, SIGUCCS '02.