Characterizing everyday activities from visual lifelogs based on enhancing concept representation

Experimental evaluation of activity characterization from lifelog images.Significant enhancement on automatic detection of everyday concepts.Positive performance results for detection of basic human activities.Automatically identifying human activities from a wearable camera has many uses in assistive living. The proliferation of wearable visual recording devices such as SenseCam, Google Glass, etc. is creating opportunities for automatic analysis and usage of digitally-recorded everyday behavior, known as visual lifelogs. Such information can be recorded in order to identify human activities and build applications that support assistive living and enhance the human experience. Although the automatic detection of semantic concepts from images within a single, narrow, domain has now reached a usable performance level, in visual lifelogging a wide range of everyday concepts are captured by the imagery which vary enormously from one subject to another. This challenges the performance of automatic concept detection and the identification of human activities because visual lifelogs will have such variety of semantic concepts across individual subjects. In this paper, we characterize the everyday activities and behavior of subjects by applying a hidden conditional random field (HCRF) algorithm on an enhanced representation of semantic concepts appearing in visual lifelogs. This is carried out by first extracting latent features of concept occurrences based on weighted non-negative tensor factorization (WNTF) to exploit temporal patterns of concept occurrence. These results are then input to an HCRF-based model to provide an automatic annotation of activity sequences from a visual lifelog. Results for this are demonstrated in experiments to show the efficacy of our algorithm in improving the accuracy of characterizing everyday activities from individual lifelogs. The overall contribution is a demonstration that using images taken by wearable cameras we can capture and characterize everyday behavior with a level of accuracy that allows useful applications which measure, or change that behavior, to be developed.

[1]  Michael Bukhin,et al.  WayMarkr: acquiring perspective through continuous documentation , 2006, MUM '06.

[2]  Shaharyar Kamal,et al.  Real-time life logging via a depth silhouette-based human activity recognition system for smart home services , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[3]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[4]  Changhu Wang,et al.  Image annotation refinement using random walk with restarts , 2006, MM '06.

[5]  Angelika Garz,et al.  Affective computing for wearable diary and lifelogging systems: An overview , 2011 .

[6]  Mikhail Belkin,et al.  Automatic Annotation of Daily Activity from Smartphone-Based Multisensory Streams , 2012, MobiCASE.

[7]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[9]  Alan F. Smeaton,et al.  Passively recognising human activities through lifelogging , 2011, Comput. Hum. Behav..

[10]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[11]  G. O'loughlin,et al.  Using a wearable camera to increase the accuracy of dietary analysis. , 2013, American journal of preventive medicine.

[12]  Yi Wu,et al.  Ontology-based multi-classification learning for video concept detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[13]  Alan F. Smeaton,et al.  Using visual lifelogs to automatically characterize everyday activities , 2013, Inf. Sci..

[14]  Dong Wang,et al.  Video search in concept subspace: a text-like paradigm , 2007, CIVR '07.

[15]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[16]  Shahram Izadi,et al.  SenseCam: A Retrospective Memory Aid , 2006, UbiComp.

[17]  Chong-Wah Ngo,et al.  Fast Semantic Diffusion for Large-Scale Context-Based Image and Video Annotation , 2012, IEEE Transactions on Image Processing.

[18]  Alan F. Smeaton,et al.  An empirical study of inter-concept similarities in multimedia ontologies , 2007, CIVR '07.

[19]  Georges Quénot,et al.  Evaluations of multi-learner approaches for concept indexing in video documents , 2010, RIAO.

[20]  Chong-Wah Ngo,et al.  Domain adaptive semantic diffusion for large scale context-based video annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Alan F. Smeaton,et al.  Factorizing Time-Aware Multi-way Tensors for Enhancing Semantic Wearable Sensing , 2015, MMM.

[22]  Alan F. Smeaton,et al.  Everyday concept detection in visual lifelogs: validation, relationships and trends , 2010, Multimedia Tools and Applications.

[23]  Prasoon Goyal,et al.  Local Deep Kernel Learning for Efficient Non-linear SVM Prediction , 2013, ICML.

[24]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[25]  Joo-Hwee Lim,et al.  Activity Recognition in Egocentric Life-Logging Videos , 2014, ACCV Workshops.

[26]  Deborah Estrin,et al.  Image browsing, processing, and clustering for participatory sensing: lessons from a DietSense prototype , 2007, EmNets '07.

[27]  Alan F. Smeaton,et al.  Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs , 2008, CIVR '08.

[28]  Changhu Wang,et al.  Content-Based Image Annotation Refinement , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Alan F. Smeaton,et al.  Automatically augmenting lifelog events using pervasively generated content from millions of people. , 2010 .

[30]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[31]  Tae-Seong Kim,et al.  Human Activity Recognition via Recognized Body Parts of Human Depth Silhouettes for Residents Monitoring Services at Smart Home , 2013 .

[32]  C. Moulin,et al.  Benefits of SenseCam review on neuropsychological test performance. , 2013, American journal of preventive medicine.

[33]  Kiyoharu Aizawa,et al.  Context-based video retrieval system for the life-log applications , 2003, MIR '03.

[34]  Alan F. Smeaton,et al.  Automatically Segmenting LifeLog Data into Events , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[35]  Chong-Wah Ngo,et al.  Towards textually describing complex video contents with audio-visual concept classifiers , 2011, ACM Multimedia.

[36]  Alan F. Smeaton,et al.  Aggregating semantic concepts for event representation in lifelogging , 2011, SWIM '11.

[37]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[38]  Tae-Seong Kim,et al.  Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home , 2012, IEEE Transactions on Consumer Electronics.

[39]  Djoerd Hiemstra,et al.  Simulating the future of concept-based video retrieval under improved detector performance , 2011, Multimedia Tools and Applications.

[40]  Alex Pentland,et al.  InSense: Interest-Based Life Logging , 2006, IEEE MultiMedia.

[41]  James Fung,et al.  Designing EyeTap Digital Eyeglasses for Continuous Lifelong Capture and Sharing of Personal Experiences , 2005 .

[42]  Alan F. Smeaton,et al.  Constructing a SenseCam visual diary as a media process , 2008, Multimedia Systems.

[43]  Shih-Fu Chang,et al.  A reranking approach for context-based concept fusion in video indexing and retrieval , 2007, CIVR '07.

[44]  Rafael Muñoz-Salinas,et al.  Example-based procedural modelling by geometric constraint solving , 2011, Multimedia Tools and Applications.

[45]  Alan F. Smeaton,et al.  Semantics-based selection of everyday concepts in visual lifelogging , 2012, International Journal of Multimedia Information Retrieval.

[46]  Jenny Benois-Pineau,et al.  The IMMED project: wearable video monitoring of people with age dementia , 2010, ACM Multimedia.

[47]  Daijin Kim,et al.  Ridge body parts features for human pose estimation and recognition from RGB-D video data , 2014, Fifth International Conference on Computing, Communications and Networking Technologies (ICCCNT).

[48]  Alan F. Smeaton,et al.  The colour of life: novel visualisations of population lifestyles , 2010, ACM Multimedia.

[49]  Shuicheng Yan,et al.  Body Surface Context: A New Robust Feature for Action Recognition From Depth Videos , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[50]  Daijin Kim,et al.  A Depth Video Sensor-Based Life-Logging Human Activity Recognition System for Elderly Care in Smart Indoor Environments , 2014, Sensors.

[51]  Abigail Sellen,et al.  Do life-logging technologies support memory for the past?: an experimental study using sensecam , 2007, CHI.

[52]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[53]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[54]  Steve Mann,et al.  'WearCam' (The wearable camera): personal imaging systems for long-term use in wearable tetherless computer-mediated reality and personal photo/videographic memory prosthesis , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[55]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[56]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.