论文信息 - Web-based language modelling for automatic lecture transcription

Web-based language modelling for automatic lecture transcription

Universities have long relied on written text to share knowledge. As more lectures are made available on-line, these must be accompanied by textual transcripts in order to provide the same access to information as textbooks. While Automatic Speech Recognition (ASR) is a cost-effective method to deliver transcriptions, its accuracy for lectures is not yet satisfactory. One approach for improving lecture ASR is to build smaller, topic-dependent Language Models (LMs) and combine them (through LM interpolation or hypothesis space combination) with general-purpose, large-vocabulary LMs. In this paper, we propose a simple solution for lecture ASR with similar or better Word Error Rate reductions (as well as topic-specific keyword identification accuracies) than combination-based approaches. Our method eliminates the need for two types of LMs by exploiting the lecture slides to collect a web corpus appropriate for modelling both the conversational and the topic-specific styles of lectures.

[1] Andreas Stolcke,et al. Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures , 2003, NAACL.

[2] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Ruhi Sarikaya,et al. Rapid language model development using external resources for new spoken dialog domains , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4] Mauro Cettolo,et al. Language modeling and transcription of the TED corpus lectures , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5] Brian Roark,et al. Meta-data conditional language modeling , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Ronald Rosenfeld,et al. Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[7] Sebastian Stüker,et al. Open Domain Speech Recognition & Translation:Lectures and Speeches , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8] Thomas Schaaf,et al. Lecture and presentation tracking in an intelligent meeting room , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[9] Treebank Penn,et al. Linguistic Data Consortium , 1999 .

[10] Ronald Rosenfeld,et al. Large-Scale Topic Detection and Language Model Adaptation. , 1997 .

[11] Dilek Z. Hakkani-Tür,et al. Bootstrapping Language Models for Spoken Dialog Systems From The World Wide Web , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12] Elaine Toms,et al. The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives , 2006, CHI.

[13] Tatsuya Kawahara,et al. Automatic transcription of lecture speech using topic-independent language modeling , 2000, INTERSPEECH.

[14] James R. Glass,et al. Automatic processing of audio lectures for information retrieval: vocabulary selection and language modeling , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15] Gerald Penn,et al. Wiki-like Editing of Imperfect Computer-Generated Webcast Transcripts , 2006 .

[16] James R. Glass,et al. Style & Topic Language Model Adaptation Using HMM-LDA , 2006, EMNLP.