Eliciting User Preferences for Personalized Explanations for Video Summaries

Video summaries or highlights are a compelling alternative for exploring and contextualizing unprecedented amounts of video material. However, the summarization process is commonly automatic, non-transparent and potentially biased towards particular aspects depicted in the original video. Therefore, our aim is to help users like archivists or collection managers to quickly understand which summaries are the most representative for an original video. In this paper, we present empirical results on the utility of different types of visual explanations to achieve transparency for end users on how representative video summaries are, with respect to the original video. We consider four types of video summary explanations, which use in different ways the concepts extracted from the original video subtitles and the video stream, and their prominence. The explanations are generated to meet target user preferences and express different dimensions of transparency: concept prominence, semantic coverage, distance and quantity of coverage. In two user studies we evaluate the utility of the visual explanations for achieving transparency for end users. Our results show that explanations representing all of the dimensions have the highest utility for transparency, and consequently, for understanding the representativeness of video summaries.

[1]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[2]  Kaiyang Zhou,et al.  Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward , 2017, AAAI.

[3]  Mark Last,et al.  Responsive News Summarization for Ubiquitous Consumption on Multiple Mobile Devices , 2018, IUI.

[4]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Anind K. Dey,et al.  Why and why not explanations improve the intelligibility of context-aware intelligent systems , 2009, CHI.

[6]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[7]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[8]  Yale Song,et al.  ElasticPlay: Interactive Video Summarization with Dynamic Time Budgets , 2017, ACM Multimedia.

[9]  Ruck Thawonmas,et al.  Video summarization via crowdsourcing , 2011, CHI EA '11.

[10]  Ioannis Patras,et al.  Implicit and Explicit Concept Relations in Deep Neural Networks for Multi-Label Video/Image Annotation , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Boqing Gong,et al.  Query-Focused Video Summarization: Dataset, Evaluation, and a Memory Network Based Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Joni-Kristian Kämäräinen,et al.  Keyframe-based Video Summarization with Human in the Loop , 2018, VISIGRAPP.

[13]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[14]  Masashi Inoue,et al.  Considering Conversation Scenes in Movie Summarization , 2018, AIRS.

[15]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Nicholas Diakopoulos,et al.  Algorithmic Transparency in the News Media , 2017 .

[17]  George Ghinea,et al.  A novel user-centered design for personalized video summarization , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[18]  Lora Aroyo,et al.  Enriching Media Collections for Event-Based Exploration , 2017, MTSR.

[19]  Nava Tintarev,et al.  Evaluating the effectiveness of explanations for recommender systems , 2012, User Modeling and User-Adapted Interaction.

[20]  Lawrence Wai-Choong Wong,et al.  ANSES: Summarisation of News Video , 2003, CIVR.

[21]  Lora Aroyo,et al.  A Human in the Loop Approach to Capture Bias and Support Media Scientists in News Video Analysis (short paper) , 2018, SAD/CrowdBias@HCOMP.

[22]  Luc Van Gool,et al.  Viewpoint-Aware Video Summarization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  John R. Kender,et al.  Optimization Algorithms for the Selection of Key Frame Sequences of Variable Length , 2002, ECCV.

[24]  Lora Aroyo,et al.  On the role of user-generated metadata in audio visual collections , 2011, K-CAP '11.

[25]  Ming-Syan Chen,et al.  Video Event Detection by Inferring Temporal Instance Labels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Luc Van Gool,et al.  Query-adaptive Video Summarization via Quality-aware Relevance Estimation , 2017, ACM Multimedia.

[27]  Mubarak Shah,et al.  Query-Focused Extractive Video Summarization , 2016, ECCV.

[28]  Ravi Iyer,et al.  Adaptive Keyframe Selection for Video Summarization , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[29]  Nam Ik Cho,et al.  Consumer video summarization based on image quality and representativeness measure , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).