论文信息 - Structuring Broadcast Audio for Information Access

Structuring Broadcast Audio for Information Access

One rapidly expanding application area for state-of-the-art speech recognition technology is the automatic processing of broadcast audiovisual data for information access. Since much of the linguistic information is found in the audio channel, speech recognition is a key enabling technology which, when combined with information retrieval techniques, can be used for searching large audiovisual document collections. Audio indexing must take into account the specificities of audio data such as needing to deal with the continuous data stream and an imperfect word transcription. Other important considerations are dealing with language specificities and facilitating language portability. At Laboratoire d′Informatique pour la Mécanique et les Sciences de l′Ingénieur (LIMSI), broadcast news transcription systems have been developed for seven languages: English, French, German, Mandarin, Portuguese, Spanish, and Arabic. The transcription systems have been integrated into prototype demonstrators for several application areas such as audio data mining, structuring audiovisual archives, selective dissemination of information, and topic tracking for media monitoring. As examples, this paper addresses the spoken document retrieval and topic tracking tasks.

Jean-Luc Gauvain | Lori Lamel

[1] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[2] Tanja Schultz,et al. Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[3] Karen Sparck Jones,et al. Spoken Document Retrieval for TREC-8 at Cambridge University , 1998, TREC.

[4] Jean-Luc Gauvain,et al. Fast decoding for indexation of broadcast data , 2000, INTERSPEECH.

[5] Puming Zhan,et al. Progress in Broadcast News transcription at Dragon Systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6] Steve J. Young,et al. A fast lattice-based approach to vocabulary independent wordspotting , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] Philip C. Woodland,et al. The development of the HTK Broadcast News transcription system: An overview , 2002, Speech Commun..

[8] Steve Renals,et al. The THISL SDR System At TREC-8 , 1999, TREC.

[9] Jean-Luc Gauvain,et al. The LIMSI SDR System for TREC-8 , 1999, TREC.

[10] Lori Lamel,et al. Investigating text normalization and pronunciation variants for German broadcast transcription , 2000, INTERSPEECH.

[11] Matthew Yuschik. Introduction to Multimedia Special Edition , 2001, Int. J. Speech Technol..

[12] Ellen M. Voorhees,et al. 1998 TREC-7 Spoken Document Retrieval Track Overview and Results , 1998 .

[13] David S. Pallett. The role of the National Institute of Standards and Technology in DARPA's Broadcast News continuous speech recognition research program , 2002, Speech Commun..

[14] Alexander H. Waibel,et al. Unsupervised training of a speech recognizer: recent experiments , 1999, EUROSPEECH.

[15] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[16] K. Sparck Jones,et al. A Probabilistic Model of Information Retrieval : Development and Status , 1998 .

[17] Steve Young,et al. Segment generation and clustering in the HTK broadcast news transcription system , 1998 .

[18] Lori Lamel,et al. The LIMSI 1998 Hub-4E Transcription System , 1997 .

[19] Satya Dharanipragada,et al. Segmentation and Detection at IBM , 2002 .

[20] Puming Zhan,et al. Dragon systems' 1998 broadcast news transcription system , 1999, EUROSPEECH.

[21] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[22] Alexandre Allauzen,et al. Transcribing audio-video archives , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[24] Hermann Ney,et al. Large vocabulary continuous speech recognition of Broadcast News - The Philips/RWTH approach , 2002, Speech Commun..

[25] Mark Liberman,et al. THE TDT-2 TEXT AND SPEECH CORPUS , 1999 .

[26] Jean-Luc Gauvain,et al. Automatic transcription of compressed broadcast audio , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[27] Andreas Stolcke,et al. Improved modeling and efficiency for automatic transcription of Broadcast News , 2002, Speech Commun..

[28] Djoerd Hiemstra,et al. Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[29] Jean-Luc Gauvain,et al. Partitioning and transcription of broadcast news data , 1998, ICSLP.

[30] Jean-Luc Gauvain,et al. THE LIMSI TOPIC TRACKING SYSTEM FOR TDT2002 , 2002 .

[31] Jean-Luc Gauvain,et al. Broadcast news transcription in Mandarin , 2000, INTERSPEECH.

[32] Daben Liu,et al. Fast speaker change detection for broadcast news transcription and indexing , 1999, EUROSPEECH.

[33] Mark Clements,et al. PHONETIC SEARCHING OF DIGITAL AUDIO , 2001 .

[34] Alexander G. Hauptmann,et al. Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[35] Michael J. Swain,et al. SpeechBot: a Speech Recognition based Audio Indexing System for the Web , 2000, RIAO.

[36] Kenney Ng. A Maximum Likelihood Ratio Information Retrieval Model , 1999, TREC.

[37] Richard M. Schwartz,et al. BBN at TREC7: Using Hidden Markov Models for Information Retrieval , 1998, TREC.

[38] Chabane Djeraba. Guest Editorial: Content-Based Multimedia Indexing and Retrieval , 2004, Multimedia Tools and Applications.