Selective Keyframe Summarisation for Egocentric Videos Based on Semantic Concept Search*

Large volumes of egocentric video data are being continually collected every day. While the standard video summarisation approach offers all-purpose summaries, here we propose a method for selective video summarisation. The user can query the video with an unlimited vocabulary of terms. The result is a time-tagged summary of keyframes related to the query concept. Our method uses a pre-trained Convolutional Neural Network (CNN) for the semantic search, and visualises the generated summary as a compass. Two commonly used datasets were chosen for the evaluation: UTEgo egocentric video and EDUB lifelog.

[1]  Joo-Hwee Lim,et al.  Active Video Summarization: Customized Summaries via On-line Interaction with the User , 2017, AAAI.

[2]  Giovanni Maria Farinella,et al.  Food vs Non-Food Classification , 2016, MADiMa @ ACM Multimedia.

[3]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[4]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Boqing Gong,et al.  Query-Focused Video Summarization: Dataset, Evaluation, and a Memory Network Based Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Petia Radeva,et al.  SR-clustering: Semantic regularized clustering for egocentric photo streams segmentation , 2015, Comput. Vis. Image Underst..

[8]  Irfan A. Essa,et al.  Discovering picturesque highlights from egocentric vacation videos , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jurandy Almeida,et al.  Comparing keyframe summaries of egocentric videos: Closest-to-centroid baseline , 2017, 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA).

[11]  Kiyoharu Aizawa,et al.  Food Balance Estimation by Using Personal Dietary Tendencies in a Multimedia Food Log , 2013, IEEE Transactions on Multimedia.

[12]  Kristen Grauman,et al.  Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior , 2014, Mobile Cloud Visual Media Computing.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[15]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[16]  Jurandy Almeida,et al.  Edited nearest neighbour for selecting keyframe summaries of egocentric videos , 2018, J. Vis. Commun. Image Represent..

[17]  Petia Radeva,et al.  Visual summary of egocentric photostreams by representative keyframes , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[18]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[19]  Rita Cucchiara,et al.  Personalized Egocentric Video Summarization of Cultural Tour on User Preferences Input , 2017, IEEE Transactions on Multimedia.

[20]  Kristen Grauman,et al.  Detecting Engagement in Egocentric Video , 2016, ECCV.

[21]  Petia Radeva,et al.  Ego-object discovery , 2015, ArXiv.

[22]  Kiyoharu Aizawa,et al.  Highly Accurate Food/Non-Food Image Classification Based on a Deep Convolutional Neural Network , 2015, ICIAP Workshops.

[23]  Touradj Ebrahimi,et al.  Food/Non-food Image Classification and Food Categorization using Pre-Trained GoogLeNet Model , 2016, MADiMa @ ACM Multimedia.

[24]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.