The LIMSI RT07 Lecture Transcription System

A system to automatically transcribe lectures and presentations has been developed in the context of the FP6 Integrated Project Chil . In addition to the seminar data recorded by the Chil partners, widely available corpora were used to train both the acoustic and language models. Acoustic model training made use of the transcribed portion of the TED corpus of Eurospeech recordings, as well as the ICSI, ISL, and NIST meeting corpora. For language model training, text materials were extracted from a variety of on-line conference proceedings. Experimental results are reported for close-talking and far-field microphones on development and evaluation data.

[1]  Jean-Luc Gauvain,et al.  Multi-stage Speaker Diarization for Conference and Lecture Meetings , 2007, CLEAR.

[2]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Lori Lamel,et al.  The translanguage English database (TED) , 1994, ICSLP.

[4]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[5]  Jean-Luc Gauvain,et al.  Speaker Diarization: From Broadcast News to Lectures , 2006, MLMI.

[6]  Alexander H. Waibel CHIL - Computers in the Human Interaction Loop , 2005, MVA.

[7]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[8]  X. Anguera,et al.  Speaker diarization for multi-party meetings using acoustic fusion , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[9]  Martial Michel,et al.  The NIST Meeting Room Pilot Corpus , 2004, LREC.

[10]  Jean-Luc Gauvain,et al.  The LIMSI RT06s Lecture Transcription System , 2006, MLMI.

[11]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Climent Nadeu,et al.  FIRST EXPERIMENTS OF AUTOMATIC SPEECH ACTIVITY DETECTION, SOURCE LOCALIZATION AND SPEECH RECOGNITION IN THE CHIL PROJECT , 2005 .

[13]  Jean-Luc Gauvain,et al.  Combining speaker identification and BIC for speaker diarization , 2005, INTERSPEECH.

[14]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[15]  H. Schwenk,et al.  Efficient training of large neural networks for language modeling , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[16]  Jean-Luc Gauvain,et al.  Transcribing lectures and seminars , 2005, INTERSPEECH.

[17]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[18]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[19]  Susanne Burger,et al.  The ISL meeting corpus: the impact of meeting type on speech style , 2002, INTERSPEECH.

[20]  Jean-Luc Gauvain,et al.  Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..