Relevance of ASR for the Automatic Generation of Keywords Suggestions for TV programs

Semantic access to multimedia content in audiovisual archives is to a large extent dependent on quantity and quality of the metadata, and particularly the content descriptions that are attached to the individual items. However, given the growing amount of materials that are being created on a daily basis and the digitization of existing analogue collections, the traditional manual annotation of collections puts heavy demands on resources, especially for large audiovisual archives. One way to address this challenge, is to introduce (semi) automatic annotation techniques for generating and/or enhancing metadata. The NWO funded CATCH-CHOICE project has investigated the extraction of keywords form textual resources related to the TV programs to be archived (context documents), in collaboration with the Dutch audiovisual archives, Sound and Vision. Besides the descriptions of the programs published by the broadcasters on their Websites, Automatic Speech Transcription (ASR) techniques from the CATCH-CHoral project, also provide textual resources that might be relevant for suggesting keywords. This paper investigates the suitability of ASR for generating such keywords, which we evaluate against manual annotations of the documents and against keywords automatically generated from context documents.

[1]  Véronique Malaisé,et al.  Deriving semantic annotations of an audiovisual program from contextual texts , 2006 .

[2]  Marijn Huijbregts,et al.  Segmentation, diarization and speech transcription : surprise data unraveled , 2008 .

[3]  Martha Larson,et al.  Spoken content retrieval: Searching spontaneous conversational speech , 2008, SIGF.

[4]  Franciska de Jong,et al.  Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition , 2007, SAMT.

[5]  Marja-Riitta Koivunen,et al.  Annotea: an open RDF infrastructure for shared Web annotations , 2001, WWW '01.

[6]  JainRamesh,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000 .

[7]  Bruno Bachimont,et al.  Is Peritext a Key for Audiovisual Documents? The Use of Texts Describing Television Programs to Assist Indexing , 2001, CICLing.

[8]  Véronique Malaisé,et al.  Disambiguating automatic semantic annotation based on a thesaurus structure , 2007 .

[9]  Véronique Malaisé,et al.  The Documentalist Support System: a Web-Services based Tool for Semantic Annotation and Browsing , 2008 .

[10]  Atanas Kiryakov,et al.  Semantic Annotation, Indexing, and Retrieval , 2003, SEMWEB.

[11]  Véronique Malaisé,et al.  Automatic Annotation Suggestions for Audiovisual Archives: Evaluation Aspects , 2009 .

[12]  Yorick Wilks,et al.  Designing Adaptive Information Extraction for the Semantic Web in Amilcare , 2003 .

[13]  Kurt Leininger,et al.  Interindexer consistency in PsycINFO , 2000, J. Libr. Inf. Sci..

[14]  Ian H. Witten,et al.  Thesaurus-based index term extraction for agricultural documents , 2005 .

[15]  John H. L. Hansen,et al.  SPEECHFIND: spoken document retrieval for a national gallery of the spoken word , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[16]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Bhuvana Ramabhadran,et al.  Automatic recognition of spontaneous speech for access to multilingual oral history archives , 2004, IEEE Transactions on Speech and Audio Processing.