Analyzing sedentary behavior in life-logging images

We describe a study that aims to understand physical activity and sedentary behavior in free-living settings. We employed a wearable camera to record 3 to 5 days of imaging data with 40 participants, resulting in over 360,000 images. These images were then fully annotated by experienced staff with a rigorous coding protocol. We designed a deep learning based classifier in which we adapted a model that was originally trained for ImageNet [1]. We then added a spatio-temporal pyramid to our deep learning based classifier. Our results show our proposed method performs better than the state-of-the-art visual classification methods on our dataset. For most of the labels our system achieves more than 90% average accuracy across different individuals for frequent labels and more than 80% average accuracy for rare labels.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Alan F. Smeaton,et al.  Passively recognising human activities through lifelogging , 2011, Comput. Hum. Behav..

[3]  Steve Hodges,et al.  SenseCam: A wearable camera that stimulates and rehabilitates autobiographical memory , 2011, Memory.

[4]  C. Matthews,et al.  Too much sitting: the population health science of sedentary behavior. , 2010, Exercise and sport sciences reviews.

[5]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  S. Carlson,et al.  Trend and prevalence estimates based on the 2008 Physical Activity Guidelines for Americans. , 2010, American journal of preventive medicine.

[7]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Gordon Bell,et al.  MyLifeBits: fulfilling the Memex vision , 2002, MULTIMEDIA '02.

[11]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[12]  Janet E. Fulton,et al.  2008 physical activity guidelines for Americans; be active, healthy, and happy! , 2008 .

[13]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[14]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[15]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[16]  Alan F. Smeaton,et al.  Using bluetooth and GPS metadata to measure event similarity in SenseCam Images , 2007 .

[17]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[18]  Noel E. O'Connor,et al.  Exploiting context information to aid landmark detection in SenseCam images , 2006 .

[19]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[20]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[21]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .