The LIMSI SDR System for TREC-8

In this paper we describe the LIMSI Spoken Document Retrieval system used in the TREC-9 evaluation. This system combines an adapted version of the LIMSI 1999 Hub-4E transcription system for speech recognition with text-based IR methods. Compared with the LIMSI TREC-8 system, this year’s system is able to index the audio data without knowledge of the story boundaries using a double windowing approach. The query expansion procedure of the information retrieval component has been revised and makes use of contemporaneous text sources. Experimental results are reported in terms of mean average precision for both the TREC SDR’99 and SDR’00 queries using the same 557h data set. The mean average precision of this year’s system is 0.5250 for SDR’99 and 0.3706 for SDR’00 for the focus unknown story boundary condition with a 20% word error rate.

[1]  Thomas Niesler,et al.  Improvements in accuracy and speed in the HTK broadcast news transcription system , 1999, EUROSPEECH.

[2]  Xavier L. Aubert,et al.  One pass cross word decoding for large vocabularies based on a lexical tree search organization , 1999, EUROSPEECH.

[3]  Jean-Luc Gauvain,et al.  Transcribing Broadcast News: The LIMSI Nov96 Hub4 System , 1997 .

[4]  Djoerd Hiemstra,et al.  Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[5]  John Makhoul,et al.  The 1998 BBN Byblos 10 x Real Time System , 1999 .

[6]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[7]  Jean-Luc Gauvain,et al.  Partitioning and transcription of broadcast news data , 1998, ICSLP.

[8]  Ellen M. Voorhees,et al.  1998 TREC-7 Spoken Document Retrieval Track Overview and Results , 1998 .

[9]  L. F. Lamel,et al.  The LIMSI Nov 93 WSJ System , 1994 .

[10]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[11]  Jean-Luc Gauvain,et al.  Transcription and indexation of broadcast data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[13]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[14]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[15]  Kenney Ng A Maximum Likelihood Ratio Information Retrieval Model , 1999, TREC.

[16]  Richard M. Schwartz,et al.  BBN at TREC7: Using Hidden Markov Models for Information Retrieval , 1998, TREC.

[17]  Karen Spärck Jones,et al.  TREC-6 1997 Spoken Document Retrieval Track Overview and Results , 1997, TREC.

[18]  Jean-Luc Gauvain,et al.  Recent advances in transcribing television and radio broadcasts , 1999, EUROSPEECH.

[19]  Michael J. Witbrock,et al.  News-on-Demand: An Application of Informedia® Technology , 1995, D Lib Mag..

[20]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[21]  Ellen M. Voorhees,et al.  The seventh text REtrieval conference (TREC-7) , 1999 .

[22]  H. Ney,et al.  Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  K. Sparck Jones,et al.  A Probabilistic Model of Information Retrieval : Development and Status , 1998 .

[24]  de Franciska Jong,et al.  OLIVE: Speech-Based Video Retrieval , 1998 .

[25]  Lori Lamel,et al.  The LIMSI 1998 Hub-4E Transcription System , 1997 .

[26]  D. K. Harmon,et al.  Overview of the Third Text Retrieval Conference (TREC-3) , 1996 .

[27]  Jean-Luc Gauvain,et al.  Transcribing broadcast news for audio and video indexing , 2000, CACM.

[28]  Steve Renals,et al.  The THISL SDR System At TREC-8 , 1999, TREC.

[29]  Karen Sparck Jones,et al.  Spoken Document Retrieval for TREC-8 at Cambridge University , 1998, TREC.

[30]  Jean-Luc Gauvain,et al.  Transcription of broadcast news , 1997, EUROSPEECH.