Real-time ASR from meetings

The AMI(DA) system is a meeting room speech recognition system that has been developed and evaluated in the context of the NIST Rich Text (RT) evaluations. Recently, the ``Distant Access'' requirements of the AMIDA project have necessitated that the system operate in real-time. Another more difficult requirement is that the system fit into a live meeting transcription scenario. We describe an infrastructure that has allowed the AMI(DA) system to evolve into one that fulfils these extra requirements. We emphasise the components that address the live and real-time aspects.

[1]  Jean Carletta,et al.  The AMIDA Automatic Content Linking Device: Just-in-Time Document Retrieval in Meetings , 2008, MLMI.

[2]  Thomas Hain,et al.  Automatic Optimization of Speech Decoder Parameters , 2010, IEEE Signal Processing Letters.

[3]  Hermann Ney,et al.  Improved methods for vocal tract normalization , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  I. McCowan,et al.  The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[5]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[6]  Mehryar Mohri,et al.  The Design Principles of a Weighted Finite-State Transducer Library , 2000, Theor. Comput. Sci..

[7]  Jan Cernocký,et al.  Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  David A. van Leeuwen,et al.  The 2007 AMI(DA) System for Meeting Transcription , 2007, CLEAR.

[9]  Philip N. Garner Silence models in weighted finite-state transducers , 2008, INTERSPEECH.

[10]  Jithendra Vepa,et al.  Juicer: A Weighted Finite-State Transducer Speech Decoder , 2006, MLMI.