The 2004 BBN 1xRT recognition systems for English broadcast news and conversational telephone speech

This paper describes the BBN real-time recognition systems used in the 2004 Rich Transcription (RT) benchmark test for the English Conversational Telephone Speech (CTS) and Broadcast News (BN) tasks. We describe the system architecture, along withthe algorithms weused inorder to reduce computation with minimal impact on recognition accuracy. Particular choices in the design of thefinal system are analyzed toshow the trade-offs between speed and accuracy. We also present recently developed new architecture for the real-time systems, which outperforms the systems we submitted for the RT04 benchmark tests for both domains.

[1]  Richard M. Schwartz,et al.  Towards a robust real-time decoder , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Richard M. Schwartz,et al.  Efficient 2-pass n-best decoder , 1997, EUROSPEECH.

[4]  Daniel Povey,et al.  Large scale discriminative training for speech recognition , 2000 .

[5]  S. Matsoukas,et al.  Improved speaker adaptation using speaker dependent feature projections , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[6]  Rohit Prasad,et al.  THE 2004 BBN/LIMSI 20xRT ENGLISH CONVERSATIONAL TELEPHONE SPEECH SYSTEM , 2004 .

[7]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Bing Xiang,et al.  Light supervision in acoustic model training , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.