The LIMSI RT06s Lecture Transcription System

This paper describes recent research carried out in the context of the FP6 Integrated Project Chil in developing a system to automatically transcribe lectures and presentations. Widely available corpora were used to train both the acoustic and language models, since only a small amount of Chil data was available for system development. Acoustic model training made use of the transcribed portion of the TED corpus of Eurospeech recordings, as well as the ICSI, ISL, and NIST meeting corpora. For language model training, text materials were extracted from a variety of on-line conference proceedings. Experimental results are reported for close-talking and far-field microphones on development and evaluation data.

[1]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[2]  Jean-Luc Gauvain,et al.  Transcribing lectures and seminars , 2005, INTERSPEECH.

[3]  Jean-Luc Gauvain,et al.  Combining speaker identification and BIC for speaker diarization , 2005, INTERSPEECH.

[4]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Jean-Luc Gauvain,et al.  Multi-stage Speaker Diarization for Conference and Lecture Meetings , 2007, CLEAR.

[6]  Lori Lamel,et al.  The translanguage English database (TED) , 1994, ICSLP.

[7]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[8]  Jean-Luc Gauvain,et al.  Speaker Diarization: From Broadcast News to Lectures , 2006, MLMI.

[9]  Alexander H. Waibel CHIL - Computers in the Human Interaction Loop , 2005, MVA.

[10]  Jean-Luc Gauvain,et al.  Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Martial Michel,et al.  The NIST Meeting Room Pilot Corpus , 2004, LREC.

[12]  H. Schwenk,et al.  Efficient training of large neural networks for language modeling , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[13]  Climent Nadeu,et al.  FIRST EXPERIMENTS OF AUTOMATIC SPEECH ACTIVITY DETECTION, SOURCE LOCALIZATION AND SPEECH RECOGNITION IN THE CHIL PROJECT , 2005 .

[14]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Susanne Burger,et al.  The ISL meeting corpus: the impact of meeting type on speech style , 2002, INTERSPEECH.

[16]  Jean-Luc Gauvain,et al.  The LIMSI RT07 Lecture Transcription System , 2007, CLEAR.

[17]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[18]  X. Anguera,et al.  Speaker diarization for multi-party meetings using acoustic fusion , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[19]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..