VC-I2R@ImageCLEF2017: Ensemble of Deep Learned Features for Lifelog Video Summarization

In this paper we describe our approach for the ImageCLEFlifelog summarization task. A total of ten runs were submitted, which used only visual features, only metadata information, or both. In the first step, a set of relevant frames are drawn from the whole lifelog. Such frames must be of good visual quality, and match the given task semantically. For the automatic runs, this subset of images is clustered into events, and the key-frames are selected from the clusters iteratively. In the interactive runs, the user can select which frames to keep or discard in each interaction, and the clustering is adapted accordingly. We observe that the more relevant features to be used depend on the context and the nature of the input lifelog.

[1]  David Elsweiler,et al.  Towards memory supporting personal information management tools , 2007, J. Assoc. Inf. Sci. Technol..

[2]  Bolei Zhou,et al.  Places: An Image Database for Deep Scene Understanding , 2016, ArXiv.

[3]  Petia Radeva,et al.  Toward Storytelling From Visual Lifelogging: An Overview , 2015, IEEE Transactions on Human-Machine Systems.

[4]  Michael Riegler,et al.  Overview of ImageCLEF 2017: Information Extraction from Images , 2017, CLEF.

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Kristen Grauman,et al.  Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior , 2014, Mobile Cloud Visual Media Computing.

[7]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Joo-Hwee Lim,et al.  Summarization of Egocentric Videos: A Comprehensive Survey , 2017, IEEE Transactions on Human-Machine Systems.

[11]  Jie Lin,et al.  Nested Invariance Pooling and RBM Hashing for Image Instance Retrieval , 2016, ICMR.

[12]  Afshin Dehghan,et al.  DAGER: Deep Age, Gender and Emotion Recognition Using Convolutional Neural Network , 2017, ArXiv.

[13]  Marc Langheinrich,et al.  Remembering through lifelogging: A survey of human memory augmentation , 2016, Pervasive Mob. Comput..

[14]  Anind K. Dey,et al.  Providing good memory cues for people with episodic memory impairment , 2007, Assets '07.

[15]  Petia Radeva,et al.  Visual summary of egocentric photostreams by representative keyframes , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Michael Riegler,et al.  Overview of ImageCLEFlifelog 2017: Lifelog Retrieval and Summarization , 2017, CLEF.

[18]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[19]  Alan F. Smeaton,et al.  Combining image descriptors to effectively retrieve events from visual lifelogs , 2008, MIR '08.

[20]  Joo-Hwee Lim,et al.  Describing Lifelogs with Convolutional Neural Networks: A Comparative Study , 2016, LTA@MM.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).