The ICSI-SRI Spring 2006 Meeting Recognition System

We describe the development of the ICSI-SRI speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2006 Meeting Rich Transcription (RT-06S) evaluation, highlighting improvements made since last year, including improvements to the delay-and-sum algorithm, the nearfield segmenter, language models, posterior-based features, HMM adaptation methods, and adapting to a small amount of new lecture data. Results are reported on RT-05S and RT-06S meeting data. Compared to the RT-05S conference system, we achieved an overall improvement of 4% relative in the MDM and SDM conditions, and 11% relative in the IHM condition. On lecture data, we achieved an overall improvement of 8% relative in the SDM condition, 12% on MDM, 14% on ADM, and 15% on IHM.

[1]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[2]  Andreas Stolcke,et al.  Prosodic knowledge sources for automatic speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Lori Lamel,et al.  The translanguage English database (TED) , 1994, ICSLP.

[5]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[6]  Jean-Luc Gauvain,et al.  Transcribing lectures and seminars , 2005, INTERSPEECH.

[7]  Andreas Stolcke,et al.  Improved speech activity detection using cross-channel features for recognition of multiparty meetings , 2006, INTERSPEECH.

[8]  John McDonough,et al.  Tracking and Far-Field Speech Recognition for Multiple Simultaneous Speakers , 2006 .

[9]  Andreas Stolcke,et al.  Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System , 2005, MLMI.

[10]  Francis Kubala,et al.  Fast Robust Inverse Transform SAT and Multi-stage Adaptation , 1998 .

[11]  Andreas Stolcke,et al.  PROGRESS IN MEETING RECOGNITION: THE ICSI-SRI-UW SPRING 2004 EVALUATION SYSTEM , 2008 .

[12]  Andreas Stolcke,et al.  Voicing feature integration in SRI's decipher LVCSR system , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Hynek Hermansky,et al.  Qualcomm-ICSI-OGI features for ASR , 2002, INTERSPEECH.

[14]  Xavier Anguera Miró,et al.  Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System , 2005, MLMI.

[15]  Thomas Hain,et al.  Strategies for Language Model Web-Data Collection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  John McDonough,et al.  Tracking multiple simultaneous speakers with probabilistic data association filteres , 2007 .

[17]  Andreas Stolcke,et al.  Trapping conversational speech: extending TRAP/tandem approaches to conversational telephone speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  José Manuel Pardo,et al.  Robust Speaker Diarization for meetings , 2006 .

[19]  Andreas Stolcke,et al.  Using MLP features in SRI's conversational speech recognition system , 2005, INTERSPEECH.