Semantic Summarization of Egocentric Photo Stream Events

With the rapid increase of users of wearable cameras in recent years and of the amount of data they produce, there is a strong need for automatic retrieval and summarization techniques. This work addresses the problem of automatically summarizing egocentric photo streams captured through a wearable camera by taking an image retrieval perspective. After removing non-informative images by a new CNN-based filter, images are ranked by relevance to ensure semantic diversity and finally re-ranked by a novelty criterion to reduce redundancy. To assess the results, a new evaluation metric is proposed which takes into account the non-uniqueness of the solution. Experimental results applied on a database of 7,110 images from 6 different subjects and evaluated by experts gave 95.74% of experts satisfaction and a Mean Opinion Score of 4.57 out of 5.0.

[1]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[2]  Alan F. Smeaton,et al.  Keyframe detection in visual lifelogs , 2008, PETRA '08.

[3]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[4]  Bob Woods,et al.  Efficacy of an evidence-based cognitive stimulation therapy programme for people with dementia , 2003, British Journal of Psychiatry.

[5]  Jana Machajdik,et al.  A Keyframe Selection of Lifelog Image Sequences , 2013, MVA.

[6]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[7]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[8]  Trevor Darrell,et al.  LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[9]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[10]  Steve Mann,et al.  'WearCam' (The wearable camera): personal imaging systems for long-term use in wearable tetherless computer-mediated reality and personal photo/videographic memory prosthesis , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[11]  Petia Radeva,et al.  Multi-Face Tracking by Extended Bag-of-Tracklets in Egocentric Videos , 2015, ArXiv.

[12]  Shahram Izadi,et al.  SenseCam: A Retrospective Memory Aid , 2006, UbiComp.

[13]  Petia Radeva,et al.  R-Clustering for Egocentric Video Segmentation , 2015, IbPRIA.

[14]  Xavier Giró-i-Nieto,et al.  End-to-end Convolutional Network for Saliency Prediction , 2015, ArXiv.

[15]  Yiannis Kompatsiaris,et al.  Proceedings of the ACM International Conference on Image and Video Retrieval , 2009, CIVR 2009.

[16]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[17]  David W. Murray,et al.  Wearable visual robots , 2000, Digest of Papers. Fourth International Symposium on Wearable Computers.

[18]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Francesco G. B. De Natale,et al.  Retrieval of Diverse Images by Pre-filtering and Hierarchical Clustering , 2014, MediaEval.

[20]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[21]  Yiannis Kompatsiaris,et al.  SocialSensor: Finding Diverse Images at MediaEval 2014 , 2014, MediaEval.

[22]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[23]  Stefan Carlsson,et al.  Novelty detection from an ego-centric perspective , 2011, CVPR 2011.

[24]  Bogdan Ionescu,et al.  Retrieving Diverse Social Images at MediaEval 2017: Challenges, Dataset and Evaluation , 2017, MediaEval.

[25]  Sachan Priyamvada Rajendra A Survey of Automatic Video Summarization Techniques , 2014 .

[26]  Alan F. Smeaton,et al.  SenseCam intervention based on Cognitive Stimulation Therapy framework for early-stage dementia , 2011, 2011 5th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops.

[27]  Abigail Sellen,et al.  Do life-logging technologies support memory for the past?: an experimental study using sensecam , 2007, CHI.

[28]  Petia Radeva,et al.  Toward Storytelling From Visual Lifelogging: An Overview , 2015, IEEE Transactions on Human-Machine Systems.

[29]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Petia Radeva,et al.  Visual summary of egocentric photostreams by representative keyframes , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[31]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[32]  Umberto Straccia,et al.  Web metasearch: rank vs. score based rank aggregation methods , 2003, SAC '03.

[33]  Petia Radeva,et al.  Ego-object discovery , 2015, ArXiv.

[34]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[35]  John D. Lafferty,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[36]  Stefan Winkler,et al.  Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives , 2016, Multimedia Systems.

[37]  Javed A. Aslam,et al.  Relevance score normalization for metasearch , 2001, CIKM '01.

[38]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[40]  Hermann Ney,et al.  Jointly optimising relevance and diversity in image retrieval , 2009, CIVR '09.

[41]  A. Smeaton,et al.  Using lifelogging to help construct the identity of people with dementia , 2014 .

[42]  Alan F. Smeaton,et al.  Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs , 2008, CIVR '08.

[43]  Kai Song,et al.  Diversifying the image retrieval results , 2006, MM '06.

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[46]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[47]  Anind K. Dey,et al.  Lifelogging memory appliance for people with episodic memory impairment , 2008, UbiComp.

[48]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.