An Approach for Automatic Segmentation of Scenes in Educational Videos through the use of Audio Transcription and Semantic Annotation

In recent years, educational videos are becoming more and more popular. Due to this increase in the amount of didactic content in the video format present on the web, it is interesting to make it possible for a search term to be related to a specific segment of the video. Better navigability allows the user to have quicker access to the topics that interest him, avoiding irrelevant content. This article proposes a method for automatic segmentation of scenes in educational videos through the use of automatic audio transcription and semantic annotation. With this targeting, you can improve content search on these videos by improving the user experience on e-learning platforms or educational video repositories.

[1]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[2]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[3]  Shih-Fu Chang,et al.  Video scene segmentation using video and audio features , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[4]  Mubarak Shah,et al.  Video scene segmentation using Markov chain Monte Carlo , 2006, IEEE Transactions on Multimedia.

[5]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[6]  John Domingue,et al.  Using Linked Data to Annotate and Search Educational Video Resources for Supporting Distance Learning , 2012, IEEE Transactions on Learning Technologies.

[7]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[8]  Irena Koprinska,et al.  Temporal video segmentation: A survey , 2001, Signal Process. Image Commun..

[9]  Clare-Marie Karat,et al.  Conversational Speech Interfaces and Technologies , 2007 .

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Rudinei Goularte,et al.  Video scene segmentation through an early fusion multimodal approach , 2016 .

[12]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[13]  Rita Cucchiara,et al.  A Deep Siamese Network for Scene Detection in Broadcast Videos , 2015, ACM Multimedia.

[14]  Christoph Meinel,et al.  Content Based Lecture Video Retrieval Using Speech and Video Text Information , 2014, IEEE Transactions on Learning Technologies.

[15]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[16]  Jenny Chapman,et al.  Digital Multimedia , 2000 .

[17]  Zhu Liu,et al.  Integration of audio and visual information for content-based video segmentation , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[18]  Atanas Kiryakov,et al.  Semantic Annotation, Indexing, and Retrieval , 2003, SEMWEB.

[19]  Chong-Wah Ngo,et al.  Structuring low-quality videotaped lectures for cross-reference browsing by video text analysis , 2008, Pattern Recognit..