Fast decoding for indexation of broadcast data

Processing time is an important factor in making a speech transcription system viable for automatic indexation of radio and television broadcasts. When only concerned by the word error rate, it is common to design systems that run in 100 times real-time or more. This paper addresses issues in reducing the speech recognition time for automatic indexation of radio and TV broadcasts with the aim of obtaining reasonable performance for close to real-time operation. We investigated computational resources in the range 1 to 10xRT on commonly available platforms. Constraints on the computational resources led us to reconsider design issues, particularly those concerning the acoustic models and the decoding strategy. A new decoder was implemented which transcribes broadcast data in few times real-time with only a slight increase in word error rate when compared to our best system. Experiments with spoken document retrieval show that comparable IR results are obtained with a 10xRT automatic transcription or with manual transcription, and that reasonable performamce is still obtained with a 1.4xRT transcription system.

[1]  Xavier L. Aubert,et al.  One pass cross word decoding for large vocabularies based on a lexical tree search organization , 1999, EUROSPEECH.

[2]  Jean-Luc Gauvain,et al.  Partitioning and transcription of broadcast news data , 1998, ICSLP.

[3]  Richard M. Schwartz,et al.  BBN at TREC7: Using Hidden Markov Models for Information Retrieval , 1998, TREC.

[4]  Jean-Luc Gauvain,et al.  Recent advances in transcribing television and radio broadcasts , 1999, EUROSPEECH.

[5]  Jean-Luc Gauvain,et al.  Transcribing broadcast news for audio and video indexing , 2000, CACM.

[6]  Lori Lamel,et al.  The LIMSI 1998 Hub-4E Transcription System , 1997 .

[7]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[8]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[9]  Jean-Luc Gauvain,et al.  The LIMSI SDR System for TREC-8 , 1999, TREC.

[10]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[11]  John Makhoul,et al.  The 1998 BBN Byblos 10 x Real Time System , 1999 .

[12]  Ellen M. Voorhees,et al.  1998 TREC-7 Spoken Document Retrieval Track Overview and Results , 1998 .

[13]  Hermann Ney,et al.  Improvements in beam search for 10000-word continuous-speech recognition , 1994, IEEE Trans. Speech Audio Process..

[14]  Thomas Niesler,et al.  Improvements in accuracy and speed in the HTK broadcast news transcription system , 1999, EUROSPEECH.