Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

This paper reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life archive of news-related genres. Performance figures for this type of content are compared to figures for broadcast news test data.

[1]  George Havas,et al.  An Optimal Algorithm for Generating Minimal Perfect Hash Functions , 1992, Inf. Process. Lett..

[2]  Detlef Koll,et al.  Modeling and efficient decoding of large vocabulary conversational speech , 1999, EUROSPEECH.

[3]  Nelleke Oostdijk,et al.  The Spoken Dutch Corpus. Overview and First Evaluation , 2000, LREC.

[4]  Patrick Wambacq,et al.  An efficient search space representation for large vocabulary continuous speech recognition , 2000, Speech Commun..

[5]  Nelleke Oostdijk,et al.  The Spoken Dutch Corpus , 2000 .

[6]  Carmen García-Mateo,et al.  Fast LM look-ahead for large vocabulary continuous speech recognition using perfect hashing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Chin-Hui Lee,et al.  Structural maximum a posteriori linear regression for fast HMM adaptation , 2002, Comput. Speech Lang..

[8]  Roeland Ordelman,et al.  Dutch speech recognition in multimedia information retrieval , 2003 .

[9]  John Makhoul,et al.  THE 2004 BBN/LIMSI 10xRT ENGLISH BROADCAST NEWS TRANSCRIPTION SYSTEM , 2004 .

[10]  David A. van Leeuwen,et al.  The TNO Speaker Diarization System for NIST RT05s Meeting Data , 2005, MLMI.

[11]  Alexandre Allauzen,et al.  Where are we in transcribing French broadcast news? , 2005, INTERSPEECH.

[12]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[13]  Xavier Anguera Miró,et al.  Robust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System , 2006, MLMI.

[14]  David A. van Leeuwen,et al.  The AMI Speaker Diarization System for NIST RT06s Meeting Data , 2006, MLMI.

[15]  Franciska de Jong,et al.  Automated Speech and Audio Analysis for Semantic Access to Multimedia , 2006, SAMT.

[16]  Xavier Anguera Miró,et al.  Robust speaker diarization for meetings: ICSI RT06s evaluation system , 2006, INTERSPEECH.

[17]  Wiebe van der Hoek,et al.  SOFSEM 2007: Theory and Practice of Computer Science , 2007 .