The task of automatically transcribing general audio data is very different from the transcription task typically required of current automatic speech recognition systems. The general goal of this work is to quantify the difficult issues posed by such data, thus leading to an understanding of how a speech recognition system may have to be altered to accommodate the added complexities. Specifically, we describe some preliminary analyses and experiments we have conducted on data collected from a radio news program. We found that using relatively straightforward acoustic measurements and classification techniques, we were able to achieve better than 80% classification accuracy for seven salient sound classes present in the data, and nearly 94% classification accuracy for a speech/non-speech decision. In addition, lexical analysis revealed that while the vocabulary size of a single broadcast is moderate, it grows exponentially as more shows are added.
[1]
Richard M. Stern,et al.
RECOGNITION OF CONTINUOUS BROADCAST NEWS WITH MULTIPLE UNKNOWN SPEAKERS AND ENVIRONMENTS
,
1995
.
[2]
David Anthony James,et al.
The Application of Classical Informa - tion Retrieval Techniques to Spoken Documents
,
1995
.
[3]
Victor Zue,et al.
New words: implications for continuous speech recognition
,
1993,
EUROSPEECH.
[4]
Michael K. McCandless,et al.
SAPPHIRE: an extensible speech analysis and recognition tool based on Tcl/Tk
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.