Recognizing Personal Contexts from Egocentric Images

Wearable cameras can gather first-person images of the environment, opening new opportunities for the development of systems able to assist the users in their daily life. This paper studies the problem of recognizing personal contexts from images acquired by wearable devices, which finds useful applications in daily routine analysis and stress monitoring. To assess the influence of different device-specific features, such as the Field Of View and the wearing modality, a dataset of five personal contexts is acquired using four different devices. We propose a benchmark classification pipeline which combines a one-class classifier to detect the negative samples (i.e., images not representing any of the personal contexts under analysis) with a classic one-vs-one multi-class classifier to discriminate among the contexts. Several experiments are designed to compare the performances of many state-of-the-art representations for object and scene classification when used with data acquired by different wearable devices.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[3]  Jindong Liu,et al.  An Intelligent Food-Intake Monitoring System Using Wearable Sensors , 2012, 2012 Ninth International Conference on Wearable and Implantable Body Sensor Networks.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Bernt Schiele,et al.  Daily Routine Recognition through Activity Spotting , 2009, LoCA.

[6]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[7]  Giovanni Maria Farinella,et al.  RECfusion: Automatic Video Curation Driven by Visual Content Popularity , 2015, ACM Multimedia.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Dima Damen,et al.  Multi-User Egocentric Online System for Unsupervised Assistance on Object Usage , 2014, ECCV Workshops.

[10]  David J. Crandall,et al.  PlaceAvoider: Steering First-Person Cameras away from Sensitive Spaces , 2014, NDSS.

[11]  Alex Pentland,et al.  Visual contextual awareness in wearable computing , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[12]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[13]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  James M. Rehg,et al.  BioGlass: Physiological parameter estimation using a head-mounted wearable device , 2014, 2014 4th International Conference on Wireless Mobile Communication and Healthcare - Transforming Healthcare Through Innovations in Mobile and Wireless Technologies (MOBIHEALTH).

[15]  Nebojsa Jojic,et al.  Structural epitome: a way to summarize one's visual experience , 2010, NIPS.

[16]  Alex Pentland,et al.  Recognizing Personal Location from Video , 1998 .

[17]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[18]  Giovanni Maria Farinella,et al.  Affine region detectors on the fisheye domain , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[19]  Matthias Rauterberg,et al.  The Evolution of First Person Vision Methods: A Survey , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Gregory D. Abowd,et al.  Predicting daily activities from egocentric images using deep learning , 2015, SEMWEB.

[21]  Giovanni Maria Farinella,et al.  Scene classification in compressed and constrained domain , 2011 .

[22]  Shmuel Peleg,et al.  Temporal Segmentation of Egocentric Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[24]  Antonio Torralba,et al.  Semantic organization of scenes using discriminant structural templates , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[25]  Giovanni Maria Farinella,et al.  Representing scenes for real-time context classification on mobile devices , 2015, Pattern Recognit..

[26]  Ying Zhang,et al.  SensCare: Semi-automatic Activity Summarization System for Elderly Care , 2011, MobiCASE.

[27]  Anind K. Dey,et al.  Capture & Access Lifelogging Assistive Technology for People with Episodic Memory Impairment , 2007 .

[28]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[29]  Giovanni Maria Farinella,et al.  Generalized Sobel Filters for gradient estimation of distorted images , 2015, 2015 IEEE International Conference on Image Processing (ICIP).