Automatic transcription of Broadcast News

Abstract This paper describes the IBM approach to Broadcast News (BN) transcription. Typical problems in the BN transcription task are segmentation, clustering, acoustic modeling, language modeling and acoustic model adaptation. This paper presents new algorithms for each of these focus problems. Some key ideas include Bayesian information criterion (BIC) (for segmentation, clustering and acoustic modeling) and speaker/cluster adapted training (SAT/CAT).

[1]  M. A. Siegler,et al.  Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[2]  Michael Picheny,et al.  Context dependent phonetic duration models for decoding conversational speech , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[4]  Ramesh A. Gopinath,et al.  Transcription Of Broadcast News Shows With The Ibm Large Vocabulary Speech Recognition System , 1997 .

[5]  Mark J. F. Gales Cluster adaptive training for speech recognition , 1998, ICSLP.

[6]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[7]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[8]  Richard M. Schwartz,et al.  The 1996 BBN BYBLOS HUB-4 Transcription System , 1996 .

[9]  Michael Picheny,et al.  Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Steve Young,et al.  The development of the 1996 HTK broadcast news transcription system , 1996 .

[11]  David Burshtein,et al.  Robust parametric modeling of durations in hidden Markov models , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[14]  F. Kubala,et al.  Automatic Speaker Clustering , 1997 .

[15]  David E. Reynolds,et al.  Automatic segmentation , 1986 .

[16]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[17]  Til T. Phan,et al.  Text-Independent Speaker Identification , 1999 .

[18]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[19]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[20]  Peder A. Olsen,et al.  Power Exponential Densities for the Training and Classification of Acoustic Feature Vectors in Speech Recognition , 2001 .

[21]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[22]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[23]  R. Gopinath CONSTRAINED MAXIMUM LIKELIHOOD MODELING WITH GAUSSIAN DISTRIBUTIONS , 2001 .

[24]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[25]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[26]  Michael Picheny,et al.  Robust methods for using context-dependent features and models in a continuous speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Mark J. F. Gales,et al.  Recent improvements to IBM's speech recognition system for automatic transcription of broadcast news , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[28]  Peder A. Olsen,et al.  IBM's LVCSR system for transcription of broadcast news used in the 1997 hub4 english evaluation , 1998 .

[29]  Matthew A. Siegler,et al.  Measuring and Compensating for the Effects of Speech Rate in Large Vocabulary Continuous Speech Recognition , 1995 .