Automatic recognition of broadcast feeds from radio and television sources has been gaining importance recently, especially with the success of systems such as the CMU Informedia system [1]. In this work we describe the problems faced in adapting a system built to recognize one utterance at a time to a task that requires recognition of an entire half hour show. We break the problem into three components: segmentation, classification, and clustering. We show that a priori knowledge of acoustic conditions and speakers in the broadcast data is not required for segmentation. The system is able to detect changes in acoustics, recognize previously observed conditions, and use this to pool adaptation data. We also describe a novel application of the Symmetric Kullback-Leibler distance metric that is used as a single solution to both the segmentation and clustering problems. The three components are evaluated through comparisons between the Partitioned and Unpartitioned components of the 1996 ARPA Hub 4 evaluation test set.
[1]
Richard O. Duda,et al.
Pattern classification and scene analysis
,
1974,
A Wiley-Interscience publication.
[2]
Thomas M. Cover,et al.
Elements of Information Theory
,
2005
.
[3]
Richard M. Stern,et al.
RECOGNITION OF CONTINUOUS BROADCAST NEWS WITH MULTIPLE UNKNOWN SPEAKERS AND ENVIRONMENTS
,
1995
.
[4]
Takeo Kanade,et al.
Informedia Digital Video Library
,
1995,
CACM.
[5]
Til T. Phan,et al.
Text-Independent Speaker Identification
,
1999
.