Online Detection of Long-Term Daily Living Activities by Weakly Supervised Recognition of Sub-Activities

In this paper, we address detection of activities in long-term untrimmed videos. Detecting temporal delineation of activities is important to analyze large-scale videos. However, there are still challenges yet to be overcome in order to have an accurate temporal segmentation of activities. Detection of daily-living activities is even more challenging due to their high intra-class and low inter-class variations, complex temporal relationships of sub-activities performed in realistic settings. To tackle these problems, we propose an online activity detection framework based on the discovery of sub-activities. We consider a long-term activity as a sequence of short-term sub-activities. Then we utilize a weakly supervised classifier trained on discovered sub-activities which allows us to predict an ongoing activity before being completely observed. To achieve a more precise segmentation a greedy post-processing technique based on Markov models is employed. We evaluate our framework on DAHLIA and GAADRD daily living activity datasets where we achieve state-of-the-art results on detection of activities.

[1]  Moustafa Meshry,et al.  Linear-time online action detection from 3D skeletal data using bags of gesturelets , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  Jean Ponce,et al.  Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Catherine Achard,et al.  The DAily Home LIfe Activity Dataset: A High Semantic Activity Dataset for Online Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[4]  Adrien Chan-Hon-Tong,et al.  Deeply Optimized Hough Transform: Application to Action Segmentation , 2013, ICIAP.

[5]  Kate Saenko,et al.  R-C3D: Region Convolutional 3D Network for Temporal Activity Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  François Brémond,et al.  A hybrid framework for online recognition of activities of daily living in real-world settings , 2016, 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[8]  Serhan Cosar,et al.  Generating unsupervised models for online long-term daily living activity recognition , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[9]  Xu Zhao,et al.  Single Shot Temporal Action Detection , 2017, ACM Multimedia.

[10]  Amr Sharaf,et al.  Real-Time Multi-scale Action Detection from 3D Skeleton Data , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[11]  Wenjun Zeng,et al.  Online Human Action Detection using Joint Classification-Regression Recurrent Neural Networks , 2016, ECCV.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yiannis Kompatsiaris,et al.  The Dem@Care Experiments and Datasets: a Technical Report , 2016, ArXiv.

[14]  Michal Koperski Human action recognition in videos with local representation , 2017 .

[15]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[16]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[18]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[19]  Yiannis Kompatsiaris,et al.  Activity Detection and Recognition of Daily Living Events , 2015, Health Monitoring and Personalized Feedback using Multimedia Data.

[20]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yiannis Kompatsiaris,et al.  Activity detection and recognition of daily living events , 2013, MIIRH '13.

[22]  前田 俊二,et al.  Single Shot MultiBox DetectorとOptical Flowを組み合わせた逆走車両検知手法の検討 , 2018 .

[23]  Jitendra Malik,et al.  Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Kristen Grauman,et al.  Efficient Activity Detection in Untrimmed Video with Max-Subgraph Search , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[26]  Bernard Ghanem,et al.  End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos , 2017, BMVC.

[27]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).