Detecting bandlimited audio in broadcast television shows

For TV and radio shows containing narrowband speech, Speech-to-text (STT) accuracy on the narrowband audio can be improved by using an acoustic model trained on acoustically matched data. To selectively apply it, one must first be able to accurately detect which audio segments are narrowband. The present paper explores two different bandwidth classi??cation approaches: a traditional Gaussian mixture model (GMM) approach and a spline-based classifier that categorizes audio segments based on their power spectra. We focus on shows found in the DARPA GALE Mandarin training and test sets, where the ratio of wideband to narrowband shows is very large. In this setting, the spline-based classifier reduces the number of misclassified wideband segments by up to 95% relative to the GMM-based classi??er for the same number of misclassified narrowband segments.

[1]  Tanja Schultz,et al.  The CMU-interACT 2008 Mandarin transcription system , 2008, INTERSPEECH.

[2]  Steve Young,et al.  Segment generation and clustering in the HTK broadcast news transcription system , 1998 .

[3]  Douglas A. Reynolds,et al.  Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Jean-Luc Gauvain,et al.  Combining speaker identification and BIC for speaker diarization , 2005, INTERSPEECH.