Video scene segmentation through an early fusion multimodal approach

Temporal segmentation of video into scenes is a prerequisite to various tasks on Multimedia Information Retrieval, like video summarization, content based video retrieval and video recommendation. There isn’t, however, a satisfactory method to automatically segment video into scenes. Stateof-the-art scene segmentation methods are multimodal, in order to match the multimodal nature of video. Aside from being multimodal, no true early fusion method was found in literature. Early fusion have shown to be useful in related multimedia tasks where potential correlation between data streams of different sources are discovered before the main processing step, improving results. Motivated by this situation, the proposal of this PhD Project is to investigate the impact of a true early fusion multimodal approach on the temporal video scene segmentation task.

[1]  Argyris Kalogeratos,et al.  Movie segmentation into scenes and chapters using locally weighted bag of visual words , 2009, CIVR '09.

[2]  Yiannis Kompatsiaris,et al.  Differential Edit Distance: A Metric for Scene Segmentation Evaluation , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  László Böszörményi,et al.  State-of-the-art and future challenges in video scene detection: a survey , 2013, Multimedia Systems.

[4]  Shih-Fu Chang,et al.  Video scene segmentation using video and audio features , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[5]  Yiannis Kompatsiaris,et al.  Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[7]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[8]  Dong Liu,et al.  Discovering joint audio–visual codewords for video event detection , 2013, Machine Vision and Applications.

[9]  Zhu Liu,et al.  Integration of audio and visual information for content-based video segmentation , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[10]  Rudinei Goularte,et al.  Shot-HR: a video shot representation method based on visual features , 2015, SAC.

[11]  Rita Cucchiara,et al.  A Deep Siamese Network for Scene Detection in Broadcast Videos , 2015, ACM Multimedia.

[12]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[13]  Mubarak Shah,et al.  Video scene segmentation using Markov chain Monte Carlo , 2006, IEEE Transactions on Multimedia.

[14]  Irena Koprinska,et al.  Temporal video segmentation: A survey , 2001, Signal Process. Image Commun..

[15]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[16]  Marcel Worring,et al.  A review on multimodal video indexing , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[17]  Marcel Worring,et al.  Systematic evaluation of logical story unit segmentation , 2002, IEEE Trans. Multim..

[18]  Shan Gao,et al.  Performance evaluation of early and late fusion methods for generic semantics indexing , 2013, Pattern Analysis and Applications.

[19]  Vasileios Mezaris,et al.  Fast shot segmentation combining global and local visual descriptors , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  B. Gross The managing of organizations : the administrative struggle , 1965 .

[21]  Jenny Chapman,et al.  Digital Multimedia , 2000 .

[22]  Ismail El Sayad,et al.  Mid-Level Image Descriptors , 2012 .

[23]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[24]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[25]  Bruno Lorenço Lopes,et al.  Video Scene Detection by Multimodal Bag of Features , 2014, J. Inf. Data Manag..

[26]  Rita Cucchiara,et al.  Scene segmentation using temporal clustering for accessing and re-using broadcast video , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[27]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..