论文信息 - Towards a repository of digital talking books

Towards a repository of digital talking books

Considerable effort has been devoted at to increase and broaden our speech and text data resources. Digital Talking Books (DTB),comprising both speech and textdata are, as such, an invaluable asset as multimedia resources. Furthermore, those DTB have been under a speech-to-text alignment procedure, either word or phone-based, to increase their potential in research activities. This paper thus describes the motivation and the method that we used to accomplish this goal for aligning DTBs. This alignment allows specific access interfaces for persons with special needs, and also tools for easily detecting and indexing units (words, sentences, topics) in the spoken books. The alignment tool was implemented in a Weighted Finite State Transducer framework, which provides an efficient way to combine different types of knowledge sources, such as alternative pronunciation rules. With this tool, a 2-hour long spoken book was aligned in a single step in much less than real time. Last but not least, new browsing interfaces, allowing improved access and data retrieval to and from the DTBs, are described in this paper.

[1] Teresa Chambel,et al. Hypervideo on the Web: Models and Techniques for Video Integration , 2001 .

[2] Timothy J. Hazen,et al. Pronunciation modeling using a finite-state transducer representation , 2005, Speech Commun..

[3] Isabel Trancoso,et al. Pronunciation modeling using finite state transducers , 2003 .

[4] Isabel Trancoso,et al. Word Alignment in Digital Talking Books Using WFSTs , 2002, ECDL.

[5] Andrej Ljolje,et al. Full expansion of context-dependent networks in large vocabulary speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6] Nuno Souto,et al. Speech recognition of broadcast news for the European Portuguese language , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[7] Arjeh M. Cohen,et al. Synchronized Multimedia Integration Language (SMIL) 2.0 , 1998 .

[8] Luís Carriço,et al. Spoken Books: Multimodal Interaction and Information Repurposing , 2003 .