In this paper, we describe our recent work in automatic transcription of broadcast news programming from radio and television. This is a very challenging recognition problem because of the frequent and unpredictable changes that occur in speaker, speaking style, topic, channel, and background conditions. Faced with such a problem, there is a strong tendency to try to carve the input into separable classes and deal with each one independently. We have chosen instead to rely on conditionindependent models and adaptive algorithms to deal with this highly variable data. In addition, we have developed e ective techniques to automatically segment the input waveform and cluster the segments into data sets containing similar speakers and conditions to support unsupervised adaptation on the test. Using this general approach, we achieved the best overall word error rate of 31.8% on the 1996 DARPA Hub-4 Unpartitioned Evaluation.
[1]
P. Sopp.
Cluster analysis.
,
1996,
Veterinary immunology and immunopathology.
[2]
R. F.,et al.
Mathematical Statistics
,
1944,
Nature.
[3]
F. Kubala,et al.
Automatic Speaker Clustering
,
1997
.
[4]
Richard M. Schwartz,et al.
A compact model for speaker-adaptive training
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[5]
Richard M. Schwartz,et al.
Practical Implementations of Speaker-Adaptive Training
,
1997
.
[6]
Herbert Gish,et al.
Segregation of speakers for speech recognition and speaker identification
,
1991,
[Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.