Single Run Action Detector over Video Stream - A Privacy Preserving Approach

This paper takes initial strides at designing and evaluating a vision-based system for privacy ensured activity monitoring. The proposed technology utilizing Artificial Intelligence (AI)-empowered proactive systems offering continuous monitoring, behavioral analysis, and modeling of human activities. To this end, this paper presents Single Run Action Detector (S-RAD) which is a real-time privacy-preserving action detector that performs end-to-end action localization and classification. It is based on Faster-RCNN combined with temporal shift modeling and segment based sampling to capture the human actions. Results on UCF-Sports and UR Fall dataset present comparable accuracy to State-of-the-Art approaches with significantly lower model size and computation demand and the ability for real-time execution on edge embedded device (e.g. Nvidia Jetson Xavier).

[1]  Chuang Gan,et al.  TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jitendra Malik,et al.  Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[6]  Mubarak Shah,et al.  VideoCapsuleNet: A Simplified Network for Action Detection , 2018, NeurIPS.

[7]  Hélio Pedrini,et al.  Multi-Stream Deep Convolutional Network Using High-Level Features Applied to Fall Detection in Video Sequences , 2019, 2019 International Conference on Systems, Signals and Image Processing (IWSSIP).

[8]  Cordelia Schmid,et al.  Action Tubelet Detector for Spatio-Temporal Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[10]  Rachid Oulad Haj Thami,et al.  Fall Detection for Elderly People Using the Variation of Key Points of Human Skeleton , 2019, IEEE Access.

[11]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[12]  Guang-Zhong Yang,et al.  Sensor Positioning for Activity Recognition Using Wearable Accelerometers , 2011, IEEE Transactions on Biomedical Circuits and Systems.

[13]  Hassan Ghasemzadeh,et al.  Optimal Policy for Deployment of Machine Learning Models on Energy-Bounded Systems , 2020, IJCAI.

[14]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[15]  Hassan Ghasemzadeh,et al.  Toward seamless wearable sensing: Automatic on-body sensor localization for physical activity monitoring , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[16]  Hassan Ghasemzadeh,et al.  Toward Ultra-Low-Power Remote Health Monitoring: An Optimal and Adaptive Compressed Sensing Framework for Activity Recognition , 2019, IEEE Transactions on Mobile Computing.

[17]  Cordelia Schmid,et al.  Multi-region Two-Stream R-CNN for Action Detection , 2016, ECCV.

[18]  Bogdan Kwolek,et al.  Human fall detection on embedded platform using depth maps and wireless accelerometer , 2014, Comput. Methods Programs Biomed..

[19]  Helio Pedrini,et al.  Fall Detection in Video Sequences Based on a Three-Stream Convolutional Neural Network , 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA).

[20]  Amir Roshan Zamir,et al.  Action Recognition in Realistic Sports Videos , 2014 .

[21]  Hamed Tabkhi,et al.  REVAMP2T: Real-Time Edge Video Analytics for Multicamera Privacy-Aware Pedestrian Tracking , 2019, IEEE Internet of Things Journal.

[22]  Lorenzo Torresani,et al.  C3D: Generic Features for Video Analysis , 2014, ArXiv.

[23]  Rui Hou,et al.  Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Cordelia Schmid,et al.  Learning to Track for Spatio-Temporal Action Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Li Feng,et al.  Deep Learning for Fall Detection: Three-Dimensional CNN Combined With LSTM on Video Kinematic Data , 2019, IEEE Journal of Biomedical and Health Informatics.

[26]  Song Han,et al.  Temporal Shift Module for Efficient Video Understanding , 2018, ArXiv.