Automated metadata extraction for semantic access to spoken word archives

Archival practice is shifting from the analogue to the digital world. A specific subset of heritage collections that impose interesting challenges for the field of language and speech technology are spoken word archives. Given the enormous backlog at audiovisual archives of unannotated materials and the generally global level of item description, collection disclosure and item access are both at risk, and (semi-)automated methods for analysis and annotation may help to increase the use and reuse of these rich content collections. In several HMI projects the interplay has been investigated between evolving user scenarios and user requirements for spoken audio collections on the one hand, and the potential of automatic annotation and search technology for the improved accessibility and search paradigms on the other hand. In this paper we will present an overview of the state-of-the-art in metadata generation for audio content and explain the crucial importance of involving user groups in the design of research agendas and road maps for novel applications in this domain.

[1]  Thomas P. Moran,et al.  Speaker segmentation for browsing recorded audio , 1995, CHI 95 Conference Companion.

[2]  Gareth Jones,et al.  Multimedia retrieval in MultiMatch: The impact of speech transcript errors on search behaviour , 2008 .

[3]  John R. Kender,et al.  Analysis and visualization of index words from audio transcripts of instructional videos , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[4]  Marijn Huijbregts,et al.  Segmentation, diarization and speech transcription : surprise data unraveled , 2008 .

[5]  Julia Hirschberg,et al.  SCAN: designing and evaluating user interfaces to support retrieval from speech archives , 1999, SIGIR '99.

[6]  Richard M. Schwartz,et al.  Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Mike Flynn,et al.  Browsing Recorded Meetings with Ferret , 2004, MLMI.

[8]  Djoerd Hiemstra,et al.  Towards Affordable Disclosure of Spoken Word Archives , 2008 .

[9]  Julia Hirschberg,et al.  What you see is (almost) what you hear: design principles for user interfaces for accessing speech archives , 1998, ICSLP.

[10]  Bhuvana Ramabhadran,et al.  Automatic recognition of spontaneous speech for access to multilingual oral history archives , 2004, IEEE Transactions on Speech and Audio Processing.

[11]  Mark H. Chignell,et al.  Searching in audio: the utility of transcripts, dichotic presentation, and time-compression , 2006, CHI.

[12]  Douglas W. Oard,et al.  Access to recorded interviews: A research agenda , 2008, JOCCH.

[13]  Franciska de Jong,et al.  Radio Oranje: searching the queen's speech(es) , 2007, SIGIR.

[14]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[15]  Elaine Toms,et al.  The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives , 2006, CHI.

[16]  Richard M. Stern,et al.  Integration of continuous speech recognition and information retrieval for mutually optimal performance , 1999 .

[17]  K. Sparck Jones,et al.  General query expansion techniques for spoken document retrieval , 1999 .

[18]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[19]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[20]  Bhuvana Ramabhadran,et al.  Cross-Language Access to Recorded Speech in the MALACH Project , 2002, TSD.

[21]  Julia Hirschberg,et al.  ASR satisficing: the effects of ASR accuracy on speech retrieval , 2000, INTERSPEECH.