Automated Speech and Audio Analysis for Semantic Access to Multimedia

The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives.

[1]  John H. L. Hansen,et al.  A new perspective on feature extraction for robust in-vehicle speech recognition , 2003, INTERSPEECH.

[2]  Stéphane Marchand-Maillet Adaptive Multimedia Retrieval: User, Context, and Feedback, 4th International Workshop, AMR 2006, Geneva, Switzerland, July 27-28, 2006, Revised Selected Papers , 2007, Adaptive Multimedia Retrieval.

[3]  Paul Over,et al.  TRECVID 2003 - an overview , 2003 .

[4]  Paul Over,et al.  TRECVID: evaluating the effectiveness of information retrieval tasks on digital video , 2004, MULTIMEDIA '04.

[5]  Karen Spärck Jones,et al.  Effects of out of vocabulary words in spoken document retrieval (poster session) , 2000, SIGIR '00.

[6]  Nelleke Oostdijk,et al.  The Spoken Dutch Corpus. Overview and First Evaluation , 2000, LREC.

[7]  Roeland Ordelman,et al.  Dutch speech recognition in multimedia information retrieval , 2003 .

[8]  Alexandre Allauzen,et al.  Diachronic vocabulary adaptation for broadcast news transcription , 2005, INTERSPEECH.

[9]  Jonathan G. Fiscus,et al.  Automatic Language Model Adaptation for Spoken Document Retrieval , 2000, RIAO.

[10]  David A. van Leeuwen,et al.  The AMI Speaker Diarization System for NIST RT06s Meeting Data , 2006, MLMI.

[11]  Nelleke Oostdijk,et al.  The Spoken Dutch Corpus , 2000 .

[12]  Thijs Westerveld,et al.  Surface Features in Video Retrieval , 2005, Adaptive Multimedia Retrieval.

[13]  David A. van Leeuwen,et al.  Automatic detection of laughter , 2005, INTERSPEECH.

[14]  Wessel Kraaij,et al.  Content Reduction for Cross-media Browsing , 2005 .

[15]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[16]  K. Sparck Jones,et al.  General query expansion techniques for spoken document retrieval , 1999 .

[17]  Lin Lawrence Chase Blame assignment for errors made by large vocabulary speech recognizers , 1997, EUROSPEECH.

[18]  Wessel Kraaij,et al.  Unsupervised Event Clustering in Multilingual News Streams , 2002 .

[19]  Wessel Kraaij,et al.  Phoneme based spoken document retrieval , 1998 .

[20]  Pedro J. Moreno,et al.  A recursive algorithm for the forced alignment of very long audio segments , 1998, ICSLP.

[21]  Chin-Hui Lee,et al.  Structural maximum a posteriori linear regression for fast HMM adaptation , 2002, Comput. Speech Lang..