An XML Resource Definition for Spoken Document Retrieval

In this paper, an XML resource definition is presented fitting in with the architecture of a multilingual (Spanish, English, Basque) spoken document retrieval system. The XML resource not only stores all the information extracted from the audio signal, but also adds the structure required to create an index database and retrieve information according to various criteria. The XML resource is based on the concept of segment and provides generic but powerful mechanisms to characterize segments and group segments into sections. Audio and video files described through this XML resource can be easily exploited in other tasks, such as topic tracking, speaker diarization, etc.

[1]  Beth Logan,et al.  Speechbot: an experimental speech-based search engine for multimedia content on the web , 2002, IEEE Trans. Multim..

[2]  John H. L. Hansen,et al.  SPEECHFIND: spoken document retrieval for a national gallery of the spoken word , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[3]  Daben Liu,et al.  Speech and language technologies for audio indexing and retrieval , 2000, Proceedings of the IEEE.

[4]  S. Matsunaga,et al.  Automatic multimedia indexing: combining audio, speech, and visual information to index broadcast news , 2006, IEEE Signal Processing Magazine.

[5]  Yingchun Yang,et al.  ASEKS: A P2P Audio Search Engine Based on Keyword Spotting , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[6]  Mark Liberman,et al.  Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..