Recognition of Activities of Daily Living from Egocentric Videos Using Hands Detected by a Deep Convolutional Network

Ambient assisted living systems aim at supporting older and impaired people using computer technology so that they can remain autonomous while maintaining healthy living. Egocentric cameras have emerged as a powerful source of data to monitor individuals performing activities of daily living since they tend to focus on the area where the current activity takes place while showing manipulated objects and hand positions. While research has focused on activity recognition based on object recognition, this study has investigated the automatic acquisition of additional features modelling interactions between hands and objects using a deep convolutional network. Experiments conducted on a realistic dataset have demonstrated that, not only do those features improve activity recognition, but they can be accurately extracted.

[1]  Francisco Javier Ferrández Pastor,et al.  A Vision-Based System for Intelligent Monitoring: Human Behaviour Analysis and Privacy by Context , 2014, Sensors.

[2]  Jean-Christophe Nebel,et al.  Recognition of Activities of Daily Living with Egocentric Vision: A Review , 2016, Sensors.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Alexandros André Chaaraoui,et al.  Silhouette-based human action recognition using sequences of key poses , 2013, Pattern Recognit. Lett..

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[9]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[10]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[11]  Fabien Cardinaux,et al.  Video based technology for ambient assisted living: A review of the literature , 2011, J. Ambient Intell. Smart Environ..

[12]  Limin Wang,et al.  Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice , 2014, Comput. Vis. Image Underst..

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.