Segmentation and Annotation of Audiovisual Recordings Based on Automated Speech Recognition

Searching multimedia data in particular audiovisual data is still a challenging task to fulfill. The number of digital video recordings has increased dramatically as recording technology has become more affordable and network infrastructure has become easy enough to provide download and streaming solutions. But, the accessibility and traceability of its content for further use is still rather limited. In our paper we are describing and evaluating a new approach to synchronizing auxiliary text-based material as, e. g. presentation slides with lecture video recordings. Our goal is to show that the tentative transliteration is sufficient for synchronization. Different approaches to synchronize textual material with deficient transliterations of lecture recordings are discussed and evaluated in this paper. Our evaluation data-set is based on different languages and various speakers' recordings.

[1]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[2]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[3]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[4]  Wolfgang Hürst,et al.  A Qualitative Study Towards Using Large Vocabulary Automatic Speech Recognition to Index Recorded Presentations for Search and Access over the Web , 2002, ICWI.

[5]  Harald Sack,et al.  Integrating Social Tagging and Document Annotation for Content-Based Search in Multimedia Data , 2006, SAAW@ISWC.

[6]  Ralph Gross,et al.  Towards a multimodal meeting record , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[7]  Christoph Meinel,et al.  Semantic indexing for recorded educational lecture videos , 2006, Fourth Annual IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOMW'06).

[8]  Yasuo Ariki,et al.  Topic segmentation and retrieval system for lecture videos based on spontaneous speech recognition , 2003, INTERSPEECH.

[9]  Wei Jyh Heng,et al.  Automatic synchronization of speech transcript and slides in presentation , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[10]  Christoph Meinel,et al.  Segmenting of Recorded Lecture Videos - The Algorithm VoiceSeg , 2006, SIGMAP.

[11]  Johanna D. Moore,et al.  AUTOMATIC TOPIC SEGMENTATION AND LABELING IN MULTIPARTY DIALOGUE , 2006, 2006 IEEE Spoken Language Technology Workshop.

[12]  Wei-Ta Chu,et al.  Cross-media correlation: a case study of navigated hypermedia documents , 2002, MULTIMEDIA '02.

[13]  John R. Kender,et al.  Augmented segmentation and visualization for presentation videos , 2005, MULTIMEDIA '05.

[14]  Chong-Wah Ngo,et al.  Structuring lecture videos for distance learning applications , 2003, Fifth International Symposium on Multimedia Software Engineering, 2003. Proceedings..

[15]  Hermann Ney,et al.  Progress in dynamic programming search for LVCSR , 2000 .

[16]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[17]  Eugene W. Myers A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming , 1998, CPM.

[18]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[19]  Bin Ma,et al.  Finding Similar Regions in Many Sequences , 2002, J. Comput. Syst. Sci..