Progress in transcription of Broadcast News using Byblos

In this paper, we describe our progress during the last four years (1995-1999) in automatic transcription of broadcast news from radio and television using the BBN Byblos speech recognition system. Overall, we achieved steady progress as reflected through the results of the last four DARPA Hub-4 evaluations, with word error rates of 42.7%, 31.8%, 20.4% and 14.7% in 1995, 1996, 1997 and 1998, respectively. This progress can be attributed to improvements in acoustic modeling, channel and speaker adaptation, and search algorithms, as well as dealing with specific characteristics of the real-life variable speech found in broadcast news. Besides improving recognition accuracy, we also succeeded in developing several algorithms to achieve close-to-real-time recognition speed without a significant sacrifice in recognition accuracy.

[1]  Michael Picheny,et al.  Decision-tree based feature-space quantization for fast Gaussian computation , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[2]  Pascale Fung,et al.  The estimation of powerful language models from small and large corpora , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Steve Austin,et al.  The forward-backward search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Richard M. Schwartz,et al.  Towards a robust real-time decoder , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Daben Liu,et al.  Fast speaker change detection for broadcast news transcription and indexing , 1999, EUROSPEECH.

[9]  H. Ney,et al.  Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Richard M. Schwartz,et al.  Search Algorithms for Software-Only Real-Time Recognition with Very Large Vocabularies , 1993, HLT.

[11]  Alex Acero,et al.  AUGMENTED CEPSTRAL NORMALIZATION FOR ROBUST SPEECH RECOGNITION , 2000 .

[12]  John Makhoul,et al.  The 1998 BBN BYBLOS Primary System applied to English and Spanish Broadcast News Transcription , 1998 .

[13]  Mei-Yuh Hwang,et al.  Predicting unseen triphones with senones , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Richard M. Schwartz,et al.  Advances in transcription of broadcast news , 1997, EUROSPEECH.

[15]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[16]  Richard M. Schwartz,et al.  Single-tree method for grammar-directed search , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  Philip C. Woodland,et al.  Speaker adaptation of HMMs using linear regression , 1994 .

[18]  John Makhoul,et al.  Further advances in transcription of broadcast news , 1999, EUROSPEECH.

[19]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[20]  Herbert Gish,et al.  Recent experiments in large vocabulary conversational speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).