Information Retrieval from Unsegmented Broadcast News Audio

This paper describes a system for retrieving relevant portions of broadcast news shows starting with only the audio data. A novel method of automatically detecting and removing commercials is presented and shown to increase the performance of the system while also reducing the computational effort required. A sophisticated large vocabulary speech recogniser which produces high-quality transcriptions of the audio and a window-based retrieval system with post-retrieval merging are also described.Results are presented using the 1999 TREC-8 Spoken Document Retrieval data for the task where no story boundaries are known. Experiments investigating the effectiveness of all aspects of the system are described, and the relative benefits of automatically eliminating commercials, enforcing broadcast structure during retrieval, using relevance feedback, changing retrieval parameters and merging during post-processing are shown.An Average Precision of 46.8%, when duplicates are scored as irrelevant, is shown to be achievable using this system, with the corresponding word error rate of the recogniser being 20.5%.

[1]  Steve Young,et al.  The development of the 1996 HTK broadcast news transcription system , 1996 .

[2]  K. Sparck Jones,et al.  Simple, proven approaches to text retrieval , 1994 .

[3]  Mark Liberman,et al.  THE TDT-2 TEXT AND SPEECH CORPUS , 1999 .

[4]  Salim Roukos,et al.  Audio-Indexing For Broadcast News , 1998, TREC.

[5]  Steve Young,et al.  Segment generation and clustering in the HTK broadcast news transcription system , 1998 .

[6]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[7]  Ross Wilkinson,et al.  Experiments in spoken document retrieval using phoneme n-grams , 2000, Speech Commun..

[8]  Philip C. Woodland,et al.  Speaker clustering using direct maximisation of the MLLR-adapted likelihood , 1998, ICSLP.

[9]  Michael J. Witbrock,et al.  Story segmentation and detection of commercials in broadcast news video , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[10]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.

[11]  Ellen M. Voorhees,et al.  Overview of the Seventh Text REtrieval Conference , 1998 .

[12]  Ellen M. Voorhees,et al.  1998 TREC-7 Spoken Document Retrieval Track Overview and Results , 1998 .

[13]  Frédéric Bimbot,et al.  Text-free speaker recognition using an arithmetic-harmonic sphericity measure , 1993, EUROSPEECH.

[14]  Ellen M. Voorhees,et al.  The Eighth Text REtrieval Conference (TREC-8) , 2000 .

[15]  Mark J. F. Gales,et al.  Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[16]  Steve Renals,et al.  The THISL SDR System At TREC-8 , 1999, TREC.

[17]  Jonathan Foote,et al.  Content-based retrieval of music and audio , 1997, Other Conferences.

[18]  Karen Spärck Jones,et al.  TREC-6 1997 Spoken Document Retrieval Track Overview and Results , 1997, TREC.

[19]  Philip C. Woodland,et al.  A method for direct audio search with applications to indexing and retrieval , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[20]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[21]  Steve Renals,et al.  Recognition, indexing and retrieval of british broadcast news with the THISL system , 1999, EUROSPEECH.

[22]  Jonathan Foote,et al.  An overview of audio information retrieval , 1999, Multimedia Systems.

[23]  K. Sparck Jones,et al.  General query expansion techniques for spoken document retrieval , 1999 .

[24]  Martin Franz,et al.  Ad hoc, Cross-language and Spoken Document Information Retrieval at IBM , 1999, Text Retrieval Conference.

[25]  Karen Sparck Jones,et al.  Spoken Document Retrieval for TREC-8 at Cambridge University , 1998, TREC.

[26]  Philip C. Woodland,et al.  Improving retrieval on imperfect speech transcriptions , 1999 .

[27]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[28]  Thomas Hain,et al.  The CUHTK-entropic 10xRT broadcast news transcription system , 1999 .

[29]  Karen Spärck Jones,et al.  Spoken document representations for probabilistic retrieval , 2000, Speech Commun..

[30]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[31]  Jean-Luc Gauvain,et al.  The LIMSI SDR System for TREC-8 , 1999, TREC.

[32]  Kunio Kashino,et al.  Time-series active search for quick retrieval of audio and video , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).