论文信息 - The ICSI-SRI Spring 2006 Meeting Recognition System

The ICSI-SRI Spring 2006 Meeting Recognition System

We describe the development of the ICSI-SRI speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2006 Meeting Rich Transcription (RT-06S) evaluation, highlighting improvements made since last year, including improvements to the delay-and-sum algorithm, the nearfield segmenter, language models, posterior-based features, HMM adaptation methods, and adapting to a small amount of new lecture data. Results are reported on RT-05S and RT-06S meeting data. Compared to the RT-05S conference system, we achieved an overall improvement of 4% relative in the MDM and SDM conditions, and 11% relative in the IHM condition. On lecture data, we achieved an overall improvement of 8% relative in the SDM condition, 12% on MDM, 14% on ADM, and 15% on IHM.

[1] Andreas G. Andreou,et al. Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[2] Andreas Stolcke,et al. Prosodic knowledge sources for automatic speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Lori Lamel,et al. The translanguage English database (TED) , 1994, ICSLP.

[5] J. Flanagan,et al. Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[6] Jean-Luc Gauvain,et al. Transcribing lectures and seminars , 2005, INTERSPEECH.

[7] Andreas Stolcke,et al. Improved speech activity detection using cross-channel features for recognition of multiparty meetings , 2006, INTERSPEECH.

[8] John McDonough,et al. Tracking and Far-Field Speech Recognition for Multiple Simultaneous Speakers , 2006 .

[9] Andreas Stolcke,et al. Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System , 2005, MLMI.

[10] Francis Kubala,et al. Fast Robust Inverse Transform SAT and Multi-stage Adaptation , 1998 .

[11] Andreas Stolcke,et al. PROGRESS IN MEETING RECOGNITION: THE ICSI-SRI-UW SPRING 2004 EVALUATION SYSTEM , 2008 .

[12] Andreas Stolcke,et al. Voicing feature integration in SRI's decipher LVCSR system , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Hynek Hermansky,et al. Qualcomm-ICSI-OGI features for ASR , 2002, INTERSPEECH.

[14] Xavier Anguera Miró,et al. Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System , 2005, MLMI.

[15] Thomas Hain,et al. Strategies for Language Model Web-Data Collection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16] John McDonough,et al. Tracking multiple simultaneous speakers with probabilistic data association filteres , 2007 .

[17] Andreas Stolcke,et al. Trapping conversational speech: extending TRAP/tandem approaches to conversational telephone speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18] José Manuel Pardo,et al. Robust Speaker Diarization for meetings , 2006 .

[19] Andreas Stolcke,et al. Using MLP features in SRI's conversational speech recognition system , 2005, INTERSPEECH.