Modelling perceptions on the evaluation of video summarization

Abstract Hours of video are uploaded to streaming platforms every minute, with recommender systems suggesting popular and relevant videos that can help users save time in the searching process. Recommender systems regularly require video summarization as an expert system to automatically identify suitable video entities and events. Since there is no well-established methodology to evaluate the relevance of summarized videos, some studies have made use of user annotations to gather evidence about the effectiveness of summarization methods. Aimed at modelling the user’s perceptions, which ultimately form the basis for testing video summarization systems, this paper seeks to propose: (i) A guideline to collect unrestricted user annotations, (ii) a novel metric called compression level of user annotation (CLUSA) to gauge the performance of video summarization methods, and (iii) a study on the quality of annotated video summaries collected from different assessment scales. These contributions lead to benchmarking video summarization methods with no constraints, even if user annotations are collected from different assessment scales for each method. Our experiments showed that CLUSA is less susceptible to unbalanced compression data sets in comparison to other metrics, hence achieving higher reliability estimates. CLUSA also allows to compare results from different video summarizing approaches.

[1]  Mei Han,et al.  Extract highlights from baseball game video with hidden Markov models , 2002, Proceedings. International Conference on Image Processing.

[2]  Bo Zhang,et al.  A Formal Study of Shot Boundary Detection , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  L. Cronbach Essentials of psychological testing , 1960 .

[4]  Shih-Fu Chang,et al.  Condensing computable scenes using visual complexity and film syntax analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[5]  Alan Hanjalic,et al.  Shot-boundary detection: unraveled and resolved? , 2002, IEEE Trans. Circuits Syst. Video Technol..

[6]  Jing Chen,et al.  User-Specific Video Summarization , 2011, 2011 International Conference on Multimedia and Signal Processing.

[7]  Christoph Meinel,et al.  Content Based Lecture Video Retrieval Using Speech and Video Text Information , 2014, IEEE Transactions on Learning Technologies.

[8]  Georges Quénot,et al.  TRECVID 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning and Hyperlinking , 2017, TRECVID.

[9]  Eric P. Xing,et al.  Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Regunathan Radhakrishnan,et al.  Generation of sports highlights using motion activity in combination with a common audio feature extraction framework , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[11]  Cuneyt M. Taskiran Evaluation of automatic video summarization systems , 2006, Electronic Imaging.

[12]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[13]  Qi Tian,et al.  Semantic retrieval of video - review of research on video retrieval in meetings, movies and broadcast news, and sports , 2006, IEEE Signal Processing Magazine.

[14]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[15]  Tao Mei,et al.  Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Remi Depommier,et al.  Content-based browsing of video sequences , 1994, MULTIMEDIA '94.

[17]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Svetlana Lazebnik,et al.  Enhancing Video Summarization via Vision-Language Embedding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Paul Over,et al.  Instance search retrospective with focus on TRECVID , 2017, International Journal of Multimedia Information Retrieval.

[20]  H. Isil Bozma,et al.  Video Summarization via Segments Summary Graphs , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[21]  Anoop Gupta,et al.  Auto-summarization of audio-video presentations , 1999, MULTIMEDIA '99.

[22]  Ali Farhadi,et al.  Ranking Domain-Specific Highlights by Analyzing Edited Videos , 2014, ECCV.

[23]  Daniel DeMenthon,et al.  Automatic Performance Evaluation for Video Summarization , 2004 .

[24]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[25]  Chong-Wah Ngo,et al.  Automatic video summarization by graph modeling , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Sudipta Roy,et al.  Video shot boundary detection: A review , 2015, 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT).

[27]  Tong Wu,et al.  Hierarchical Union-of-Subspaces Model for Human Activity Summarization , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[28]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Qi Tian,et al.  Multilevel video representation with application to keyframe extraction , 2004, 10th International Multimedia Modelling Conference, 2004. Proceedings..

[30]  John R. Kender,et al.  Design and evaluation of a music video summarization system , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[31]  Yongdong Zhang,et al.  Multi-task deep visual-semantic embedding for video thumbnail selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yale Song,et al.  Video co-summarization: Video summarization by visual co-occurrence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Boqing Gong,et al.  Query-Focused Video Summarization: Dataset, Evaluation, and a Memory Network Based Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[36]  Tianming Liu,et al.  A novel video key-frame-extraction algorithm based on perceived motion energy model , 2003, IEEE Trans. Circuits Syst. Video Technol..

[37]  Yuzhen Niu,et al.  Video summagator: an interface for video summarization and navigation , 2012, CHI.

[38]  John R. Kender,et al.  An efficient error-minimizing algorithm for variable-rate temporal video sampling , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[39]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.