TRANSCRIBING MULTILINGUAL BROADCAST NEWS USING HYPOTHESIS DRIVEN LEXICAL ADAPTATION

This paper describes first results of our DARPA-sponsored efforts toward recognizing and browsing foreign language, more specifically, Serbo-Croatian broadcast news. For Serbo-Croatian as well as many other than the most common well studied languages, the problems of broadcast quality recognition are complicated by 1.) the lack of available acoustic and language data, and 2.) the excessive vocabulary growth in heavily inflected languages that lead to unacceptable OOV-rates. We present a Serbo-Croatian large vocabulary system that achieves a 74% recognition rate, despite limited training data. Our system achieves this rate by a multipass strategy that dynamically adapts the recognition dictionary to the speech segment to be recognized by generating morphological variations (Hypothesis Driven Lexical Adaptation).

[1]  Alexander G. Hauptmann,et al.  Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[2]  Alexander H. Waibel,et al.  Serbo-Croatian LVCSR on the dictation and broadcast news domain , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).