论文信息 - Dynamic language model adaptation using presentation slides for lecture speech recognition

Dynamic language model adaptation using presentation slides for lecture speech recognition

We propose a dynamic language model adaptation method that uses the temporal information from lecture slides for lecture speech recognition. The proposed method consists of two steps. First, the language model is adapted with the text information extracted from all the slides of a given lecture. Next, the text information of a given slide is extracted based on temporal information and used for local adaptation. Hence, the language model, used to recognize speech associated with the given slide changes dynamically from one slide to the next. We evaluated the proposed method with the speech data from four Japanese lecture courses. Our experiments show the effectiveness of our proposed method, especially for keyword detection. The Fmeasure error rate for lecture keywords was reduced by 2.4%.

[1] H. Yokota. Unified Contents Retrieval from an Academic Repository , 2006 .

[2] Thomas Ottmann,et al. The “Authoring on the Fly” system for automated recording and replay of (tele)presentations , 2000, Multimedia Systems.

[3] Jean-Luc Gauvain,et al. Transcribing lectures and seminars , 2005, INTERSPEECH.

[4] Katunobu Itou,et al. LODEM: A system for on-demand video lectures , 2006, Speech Commun..

[5] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[6] Satoshi Naoi,et al. Slide identification for lecture movies by matching characters and images , 2003, IS&T/SPIE Electronic Imaging.

[7] Gregory D. Abowd,et al. Classroom 2000: An Experiment with the Instrumentation of a Living Educational Environment , 1999, IBM Syst. J..

[8] Y A Bukhshtab,et al. Digital Video Library. , 2000 .

[9] Ian H. Witten,et al. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[10] Satoshi Naoi,et al. UPRISE: Unified Presentation Slide Retrieval by Impression Search Engine , 2004, IEICE Trans. Inf. Syst..

[11] Hitoshi Isahara,et al. Spontaneous Speech Corpus of Japanese , 2000, LREC.

[12] Helena Moniz,et al. Recognition of classroom lectures in european portuguese , 2006, INTERSPEECH.

[13] Sadaoki Furui. Recent Progress in Corpus-Based Spontaneous Speech Recognition , 2005, IEICE Trans. Inf. Syst..

[14] Lori Lamel,et al. The translanguage English database (TED) , 1994, ICSLP.

[15] James R. Glass,et al. Analysis and Processing of Lecture Audio Data: Preliminary Investigations , 2004, Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004 - SpeechIR '04.