Easy Listening: Spoken Document Retrieval in CHoral

Abstract Given the enormous backlog at audiovisual archives and the generally global level of item description, collection disclosure and item access are both at risk. At the same time, archival practice is seeking to evolve from the analogue to the digital world. CHoral investigates the role automatic annotation and search technology can play in improving disclosure and access of digitized spoken word collections during and after this transfer. The core business of the CHoral project is to design and build technology for spoken document retrieval for heritage collections. In this paper, we will argue that in addition to solving technological issues, closer attention is needed for the work-flow and daily practice at audiovisual archives on the one hand, and the state-of-the-art in technology on the other. Analysis of the interplay is needed to ensure that new developments are mutually beneficial and that continuing cooperation can indeed bring envisioned advancements.

[1]  John R. Kender,et al.  Analysis and visualization of index words from audio transcripts of instructional videos , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[2]  Richard M. Stern,et al.  Integration of continuous speech recognition and information retrieval for mutually optimal performance , 1999 .

[3]  K. Sparck Jones,et al.  General query expansion techniques for spoken document retrieval , 1999 .

[4]  Mike Flynn,et al.  Browsing Recorded Meetings with Ferret , 2004, MLMI.

[5]  Alex Acero,et al.  Position Specific Posterior Lattices for Indexing Speech , 2005, ACL.

[6]  Bhuvana Ramabhadran,et al.  Supporting access to large digital oral history archives , 2002, JCDL '02.

[7]  Bhuvana Ramabhadran,et al.  Automatic recognition of spontaneous speech for access to multilingual oral history archives , 2004, IEEE Transactions on Speech and Audio Processing.

[8]  Elaine Toms,et al.  The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives , 2006, CHI.

[9]  Dragutin Petkovic,et al.  Spoken Document Retrieval , 2000 .

[10]  Willemijn Heeren,et al.  Evaluating ASR Output for Information Retrieval , 2007, SIGIR 2007.

[11]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[12]  Julia Hirschberg,et al.  SCAN: designing and evaluating user interfaces to support retrieval from speech archives , 1999, SIGIR '99.

[13]  Kuldip K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1993, IEEE Trans. Speech Audio Process..

[14]  Franciska de Jong,et al.  Affordable access to multimedia by exploiting collateral data , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[15]  Bhuvana Ramabhadran,et al.  Cross-Language Access to Recorded Speech in the MALACH Project , 2002, TSD.

[16]  Julia Hirschberg,et al.  What you see is (almost) what you hear: design principles for user interfaces for accessing speech archives , 1998, ICSLP.

[17]  Carol Peters,et al.  MultiMatch - Multilingual/Multimedia Access to Cultural Heritage , 2007, ECDL.

[18]  Djoerd Hiemstra,et al.  Towards Affordable Disclosure of Spoken Word Archives , 2008 .

[19]  Hermann Ney,et al.  Using posterior word probabilities for improved speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[20]  Thomas P. Moran,et al.  Speaker segmentation for browsing recorded audio , 1995, CHI 95 Conference Companion.

[21]  Gareth Jones,et al.  Multimedia retrieval in MultiMatch: The impact of speech transcript errors on search behaviour , 2008 .

[22]  Julia Hirschberg,et al.  ASR satisficing: the effects of ASR accuracy on speech retrieval , 2000, INTERSPEECH.

[23]  Franciska de Jong,et al.  A Spoken Document Retrieval Application in the Oral History Domain , 2005 .

[24]  Richard M. Schwartz,et al.  Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Mark H. Chignell,et al.  Searching in audio: the utility of transcripts, dichotic presentation, and time-compression , 2006, CHI.

[26]  Douglas W. Oard,et al.  Access to recorded interviews: A research agenda , 2008, JOCCH.