论文信息 - Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

This paper reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life archive of news-related genres. Performance figures for this type of content are compared to figures for broadcast news test data.

[1] George Havas,et al. An Optimal Algorithm for Generating Minimal Perfect Hash Functions , 1992, Inf. Process. Lett..

[2] Detlef Koll,et al. Modeling and efficient decoding of large vocabulary conversational speech , 1999, EUROSPEECH.

[3] Nelleke Oostdijk,et al. The Spoken Dutch Corpus. Overview and First Evaluation , 2000, LREC.

[4] Patrick Wambacq,et al. An efficient search space representation for large vocabulary continuous speech recognition , 2000, Speech Commun..

[5] Nelleke Oostdijk,et al. The Spoken Dutch Corpus , 2000 .

[6] Carmen García-Mateo,et al. Fast LM look-ahead for large vocabulary continuous speech recognition using perfect hashing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Chin-Hui Lee,et al. Structural maximum a posteriori linear regression for fast HMM adaptation , 2002, Comput. Speech Lang..

[8] Roeland Ordelman,et al. Dutch speech recognition in multimedia information retrieval , 2003 .

[9] John Makhoul,et al. THE 2004 BBN/LIMSI 10xRT ENGLISH BROADCAST NEWS TRANSCRIPTION SYSTEM , 2004 .

[10] David A. van Leeuwen,et al. The TNO Speaker Diarization System for NIST RT05s Meeting Data , 2005, MLMI.

[11] Alexandre Allauzen,et al. Where are we in transcribing French broadcast news? , 2005, INTERSPEECH.

[12] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[13] Xavier Anguera Miró,et al. Robust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System , 2006, MLMI.

[14] David A. van Leeuwen,et al. The AMI Speaker Diarization System for NIST RT06s Meeting Data , 2006, MLMI.

[15] Franciska de Jong,et al. Automated Speech and Audio Analysis for Semantic Access to Multimedia , 2006, SAMT.

[16] Xavier Anguera Miró,et al. Robust speaker diarization for meetings: ICSI RT06s evaluation system , 2006, INTERSPEECH.

[17] Wiebe van der Hoek,et al. SOFSEM 2007: Theory and Practice of Computer Science , 2007 .