Personal-location-based temporal segmentation of egocentric videos for lifelogging applications

Abstract Temporal video segmentation is useful to exploit and organize long egocentric videos. Previous work has focused on general purpose methods designed to deal with data acquired by different users. In contrast, egocentric video tends to be very personal and meaningful for the specific user who acquires it. We propose a method to segment egocentric video according to the personal locations visited by the user. The method aims at providing a personalized output and allows the user to specify which locations he wants to keep track of. To account for negative locations (i.e., locations not specified by the user), we propose a negative rejection method which does not require any negative sample at training time. For the experiments, we collected a dataset of egocentric videos in 10 different personal locations, plus various negative ones. Results show that the method is accurate and compares favorably with the state of the art.

[1]  Giovanni Maria Farinella,et al.  Representing scenes for real-time context classification on mobile devices , 2015, Pattern Recognit..

[2]  Giovanni Maria Farinella,et al.  Exploiting Textons Distributions on Spatial Hierarchy for Scene Classification , 2010, EURASIP J. Image Video Process..

[3]  Kristen Grauman,et al.  Detecting Engagement in Egocentric Video , 2016, ECCV.

[4]  Dima Damen,et al.  SEMBED: Semantic Embedding of Egocentric Action Videos , 2016, ECCV Workshops.

[5]  Bolei Zhou,et al.  Places: An Image Database for Deep Scene Understanding , 2016, ArXiv.

[6]  Vasileios Mezaris,et al.  Fast shot segmentation combining global and local visual descriptors , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[8]  Alan F. Smeaton,et al.  LifeLogging: Personal Big Data , 2014, Found. Trends Inf. Retr..

[9]  Shmuel Peleg,et al.  Compact CNN for indexing egocentric videos , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  James M. Rehg,et al.  Delving into egocentric actions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Stefano Alletto,et al.  Motion Segmentation using Visual and Bio-mechanical Features , 2016, ACM Multimedia.

[12]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[13]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Giovanni Maria Farinella,et al.  Distortion adaptive Sobel filters for the gradient estimation of wide angle images , 2017, J. Vis. Commun. Image Represent..

[15]  Alan F. Smeaton,et al.  Automatically Segmenting LifeLog Data into Events , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[16]  Nikolas P. Galatsanos,et al.  Scene Detection in Videos Using Shot Clustering and Sequence Alignment , 2009, IEEE Transactions on Multimedia.

[17]  Petia Radeva,et al.  R-Clustering for Egocentric Video Segmentation , 2015, IbPRIA.

[18]  James M. Rehg,et al.  Gaze-enabled egocentric video summarization via constrained submodular maximization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[21]  Giovanni Maria Farinella,et al.  Temporal Segmentation of Egocentric Videos to Highlight Personal Locations of Interest , 2016, ECCV Workshops.

[22]  Gregory D. Abowd,et al.  Predicting daily activities from egocentric images using deep learning , 2015, SEMWEB.

[23]  Giovanni Maria Farinella,et al.  Recognizing Personal Contexts from Egocentric Images , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[24]  Irena Koprinska,et al.  Temporal video segmentation: A survey , 2001, Signal Process. Image Commun..

[25]  Luca Benini,et al.  Context Change Detection for an Ultra-Low Power Low-Resolution Ego-Vision Imager , 2016, ECCV Workshops.

[26]  David J. Crandall,et al.  PlaceAvoider: Steering First-Person Cameras away from Sensitive Spaces , 2014, NDSS.

[27]  Alex Pentland,et al.  Visual contextual awareness in wearable computing , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[28]  Wei-Hao Lin,et al.  Structuring continuous video recordings of everyday life using time-constrained clustering , 2006, Electronic Imaging.

[29]  Petia Radeva,et al.  Visual summary of egocentric photostreams by representative keyframes , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[30]  Shmuel Peleg,et al.  Temporal Segmentation of Egocentric Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Giovanni Maria Farinella,et al.  Affine Covariant Features for Fisheye Distortion Local Modeling , 2017, IEEE Transactions on Image Processing.

[33]  Giovanni Maria Farinella,et al.  Recognizing Personal Locations From Egocentric Videos , 2017, IEEE Transactions on Human-Machine Systems.

[34]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[35]  Marcel Worring,et al.  Systematic evaluation of logical story unit segmentation , 2002, IEEE Trans. Multim..

[36]  Petia Radeva,et al.  SR-clustering: Semantic regularized clustering for egocentric photo streams segmentation , 2015, Comput. Vis. Image Underst..

[37]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Kiyoharu Aizawa,et al.  Summarizing wearable video , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[39]  Giovanni Maria Farinella,et al.  Affine region detectors on the fisheye domain , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[40]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[41]  Giovanni Maria Farinella,et al.  RECfusion: Automatic Video Curation Driven by Visual Content Popularity , 2015, ACM Multimedia.

[42]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[43]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[44]  Nicholas Rhinehart,et al.  Learning Action Maps of Large Environments via First-Person Vision , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Giovanni Maria Farinella,et al.  Organizing egocentric videos of daily living activities , 2017, Pattern Recognit..

[47]  Rita Cucchiara,et al.  Personalized Egocentric Video Summarization for Cultural Experience , 2015, ICMR.

[48]  Rita Cucchiara,et al.  Measuring Scene Detection Performance , 2015, IbPRIA.

[49]  Rita Cucchiara,et al.  Shot and Scene Detection via Hierarchical Clustering for Re-using Broadcast Video , 2015, CAIP.

[50]  Kris M. Kitani,et al.  Going Deeper into First-Person Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Yiannis Kompatsiaris,et al.  Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[52]  Alex Pentland,et al.  Recognizing Personal Location from Video , 1998 .