Summarization of Egocentric Videos: A Comprehensive Survey

The introduction of wearable video cameras (e.g., GoPro) in the consumer market has promoted video life-logging, motivating users to generate large amounts of video data. This increasing flow of first-person video has led to a growing need for automatic video summarization adapted to the characteristics and applications of egocentric video. With this paper, we provide the first comprehensive survey of the techniques used specifically to summarize egocentric videos. We present a framework for first-person view summarization and compare the segmentation methods and selection algorithms used by the related work in the literature. Next, we describe the existing egocentric video datasets suitable for summarization and, then, the various evaluation methods. Finally, we analyze the challenges and opportunities in the field and propose new lines of research.

[1]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Alan F. Smeaton,et al.  Combining image descriptors to effectively retrieve events from visual lifelogs , 2008, MIR '08.

[3]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Vlad I. Morariu,et al.  Summarizing While Recording: Context-Based Highlight Detection for Egocentric Videos , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[5]  Gunhee Kim,et al.  Storyline Representation of Egocentric Videos with an Applications to Story-Based Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Jianxiong Xiao,et al.  What makes an image memorable? , 2011, CVPR 2011.

[7]  Joo-Hwee Lim,et al.  Understanding the Nature of First-Person Videos: Characterization and Classification Using Low-Level Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Shmuel Peleg,et al.  Temporal Segmentation of Egocentric Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Tao Mei,et al.  Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Kristen Grauman,et al.  Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior , 2014, Mobile Cloud Visual Media Computing.

[11]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[12]  Petia Radeva,et al.  Toward Storytelling From Visual Lifelogging: An Overview , 2015, IEEE Transactions on Human-Machine Systems.

[13]  Keiji Yanai,et al.  Summarization of Egocentric Moving Videos for Generating Walking Route Guidance , 2013, PSIVT.

[14]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[15]  Li Fei-Fei,et al.  VideoSET: Video Summary Evaluation through Text , 2014, ArXiv.

[16]  Joo-Hwee Lim,et al.  Organizing and retrieving episodic memories from first person view , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[17]  Raanan Fattal,et al.  Video stabilization using epipolar geometry , 2012, TOGS.

[18]  Michael F. Cohen,et al.  Real-time hyperlapse creation via optimal frame selection , 2015, ACM Trans. Graph..

[19]  Lie Lu,et al.  Highlight sound effects detection in audio stream , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[20]  Steve E Hodges,et al.  Wearable cameras in health: the state of the art and future possibilities. , 2013, American journal of preventive medicine.

[21]  Steve Mann,et al.  'WearCam' (The wearable camera): personal imaging systems for long-term use in wearable tetherless computer-mediated reality and personal photo/videographic memory prosthesis , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[22]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Richard Szeliski,et al.  First-person hyper-lapse videos , 2014, ACM Trans. Graph..

[24]  Shmuel Peleg,et al.  Compact CNN for indexing egocentric videos , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[25]  Alan F. Smeaton,et al.  Experiences of Aiding Autobiographical Memory Using the SenseCam , 2012, Hum. Comput. Interact..

[26]  Rita Cucchiara,et al.  Personalized Egocentric Video Summarization for Cultural Experience , 2015, ICMR.

[27]  Jessica K. Hodgins,et al.  Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database , 2008 .

[28]  Irfan A. Essa,et al.  Discovering picturesque highlights from egocentric vacation videos , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[29]  Shmuel Peleg,et al.  EgoSampling: Fast-forward and stereo for egocentric videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Petia Radeva,et al.  Visual summary of egocentric photostreams by representative keyframes , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[31]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[32]  Alan F. Smeaton,et al.  Lifelogging and EEG: utilising neural signals for sorting lifelog image data , 2014 .

[33]  Ali Farhadi,et al.  Ranking Domain-Specific Highlights by Analyzing Edited Videos , 2014, ECCV.

[34]  David Jacobs,et al.  CTSR 2011-03 Digital Video Stabilization and Rolling Shutter Correction using Gyroscopes , 2011 .

[35]  Kiyoharu Aizawa,et al.  Practical experience recording and indexing of Life Log video , 2005, CARPE '05.

[36]  Luc Van Gool,et al.  The Interestingness of Images , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Rita Cucchiara,et al.  Egocentric Video Summarization of Cultural Tour based on User Preferences , 2015, ACM Multimedia.

[38]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Alan F. Smeaton,et al.  Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs , 2008, CIVR '08.

[40]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[41]  Alan F. Smeaton,et al.  Constructing a SenseCam visual diary as a media process , 2008, Multimedia Systems.

[42]  Yaser Sheikh,et al.  Automatic editing of footage from multiple social cameras , 2014, ACM Trans. Graph..

[43]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[46]  Joo-Hwee Lim,et al.  Incremental Graph Clustering for Efficient Retrieval from Streaming Egocentric Video Data , 2014, 2014 22nd International Conference on Pattern Recognition.

[47]  Kiyoharu Aizawa,et al.  Summarizing wearable video , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[48]  Matthias Rauterberg,et al.  The Evolution of First Person Vision Methods: A Survey , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[49]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[50]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Gordon Bell,et al.  Passive capture and ensuing issues for a personal lifetime store , 2004, CARPE'04.

[52]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[53]  Joo-Hwee Lim,et al.  Efficient Retrieval from Large-Scale Egocentric Visual Data Using a Sparse Graph Representation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[54]  Petia Radeva,et al.  R-Clustering for Egocentric Video Segmentation , 2015, IbPRIA.

[55]  James M. Rehg,et al.  Gaze-enabled egocentric video summarization via constrained submodular maximization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Kiyoharu Aizawa,et al.  Summarization of wearable videos using support vector machine , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[57]  Shmuel Peleg,et al.  EgoSampling: Wide View Hyperlapse from Single and Multiple Egocentric Videos , 2016, ArXiv.

[58]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[59]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[60]  Kiyoharu Aizawa,et al.  Efficient retrieval of life log based on context and content , 2004, CARPE'04.

[61]  Shmuel Peleg,et al.  Wisdom of the Crowd in Egocentric Video Curation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.