Enhanced Multimedia Content Access and Exploitation Using Semantic Speech Retrieval

Techniques for automatic annotation of spoken content making use of speech recognition technology have long been characterized as holding unrealized promise to provide access to archives inundated with undisclosed multimedia material. This paper provides an overview of techniques and trends in semantic speech retrieval, which is taken to encompass all approaches offering meaning-based access to spoken word collections. We present descriptions, examples and insights for current techniques, including facing real-world heterogenity, aligning parallel resources and exploiting collateral collections. We also discuss ways in which speech recognition technology can be used to create multimedia connections that make new modes of access available to users. We conclude with an overview of the challenges for semantic speech retrieval in the workflow of a real-world archive and perspectives on future tasks in which speech retrieval integrates information related to affect and appeal, dimensions that transcend topic.

[1]  Lin-shan Lee,et al.  Spoken document understanding and organization , 2005, IEEE Signal Processing Magazine.

[2]  Martha Larson,et al.  Overview of VideoCLEF 2008: Automatic Generation of Topic-based Feeds for Dual Language Audio-Visual Content , 2008, CLEF.

[3]  M. de Rijke,et al.  PodCred: a framework for analyzing podcast preference , 2008, WICOW '08.

[4]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[5]  Richard M. Stern,et al.  Integration of continuous speech recognition and information retrieval for mutually optimal performance , 1999 .

[6]  Jonathan G. Fiscus,et al.  1998 Broadcast News Benchmark Test Results: English and Non-English Word Error Rate Performance Measures , 1998 .

[7]  Roeland Ordelman,et al.  Exploration of audiovisual heritage using audio indexing technology , 2006 .

[8]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[9]  Franciska de Jong,et al.  Radio Oranje: searching the queen's speech(es) , 2007, SIGIR.

[10]  Wessel Kraaij,et al.  Content Reduction for Cross-media Browsing , 2005 .

[11]  Véronique Malaisé,et al.  Relevance of ASR for the Automatic Generation of Keywords Suggestions for TV programs , 2009 .

[12]  S. Renals,et al.  Content-based access to spoken audio , 2005, IEEE Signal Processing Magazine.

[13]  Jun Yang,et al.  (Un)Reliability of video concept detection , 2008, CIVR '08.

[14]  Marijn Huijbregts,et al.  Segmentation, diarization and speech transcription : surprise data unraveled , 2008 .

[15]  Frank Seide,et al.  Word-lattice based spoken-document indexing with standard text indexers , 2008, 2008 IEEE Spoken Language Technology Workshop.

[16]  Franciska de Jong,et al.  Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition , 2007, SAMT.

[17]  Franciska de Jong,et al.  Radio Oranje: Enhanced Access to a Historical Spoken Word Collection , 2007, CLIN 2007.

[18]  Pedro J. Moreno,et al.  A recursive algorithm for the forced alignment of very long audio segments , 1998, ICSLP.

[19]  M. de Rijke,et al.  Exploiting Surface Features for the Prediction of Podcast Preference , 2009, ECIR.

[20]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.