Robust Event Detection based on Spatio-Temporal Latent Action Unit using Skeletal Information

This paper proposes a novel dictionary learning approach to detect event anomalities using skeletal information extracted from RGBD video. The event action is represented as several latent action atoms and composed of latent spatial and temporal attributes. We aim to construct a network able to learn from few examples and also rules defined by the user. The skeleton frames are clustered by an initial K-means method. Each skeleton frame is assigned with a varying weight parameter and fed into our Gradual Online Dictionary Learning (GODL) algorithm. During the training process, outlier frames will be gradually filtered by reducing the weight that is inversely proportional to a cost. To strictly distinguish the event action from similar actions and robustly acquire its action units, we build a latent unit temporal structure for each sub-action.We validate the method at the example of fall event detection on NTU RGB+D dataset, because it provides a benchmark available for comparison. We present the experimental validation of the achieved accuracy, recall, and precision. Our approach achieves the best performance in precision and accuracy of human fall event detection, compared with other existing dictionary learning methods. Our method remains the highest accuracy and the lowest variance, with increasing noise ratio.

[1]  Anders Grunnet-Jepsen,et al.  Intel RealSense Stereoscopic Depth Cameras , 2017, CVPR 2017.

[2]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[3]  Shuang Wang,et al.  An automatic human fall detection approach using RGBD cameras , 2016, 2016 5th International Conference on Computer Science and Network Technology (ICCSNT).

[4]  Fei-Fei Li,et al.  Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Horst-Michael Groß,et al.  Fallen Person Detection for Mobile Robots Using 3D Depth Data , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[6]  Ke Wang,et al.  Skeleton Based Fall Detection with Convolutional Neural Network , 2019, 2019 Chinese Control And Decision Conference (CCDC).

[7]  Hassen Drira,et al.  Coding Kendall's Shape Trajectories for 3D Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Zhihong Zhou,et al.  Fall detection and recognition based on GCN and 2D Pose , 2019, 2019 6th International Conference on Systems and Informatics (ICSAI).

[9]  2019 Chinese Control And Decision Conference (CCDC) , 2019 .

[10]  Donghui Wang,et al.  A Dictionary Learning Approach for Classification: Separating the Particularity and the Commonality , 2012, ECCV.

[11]  Susan M. Astley,et al.  Evaluation of Kinect 3D Sensor for Healthcare Imaging , 2016, Journal of medical and biological engineering.

[12]  Vishal Monga,et al.  Learning a low-rank shared dictionary for object classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[13]  Edouard Auvinet,et al.  Head detection using Kinect camera and its application to fall detection , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[14]  Haibo Wang,et al.  Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine , 2014, IEEE Journal of Biomedical and Health Informatics.

[15]  Marjorie Skubic,et al.  Fall Detection in Homes of Older Adults Using the Microsoft Kinect , 2015, IEEE Journal of Biomedical and Health Informatics.

[16]  Wen-Nung Lie,et al.  Fully Convolutional Network for 3D Human Skeleton Estimation from a Single View for Action Analysis , 2019, 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[17]  Jean Meunier,et al.  Robust Video Surveillance for Fall Detection Based on Human Shape Deformation , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Wen-Nung Lie,et al.  Human fall-down event detection based on 2D skeletons and deep learning approach , 2018, 2018 International Workshop on Advanced Image Technology (IWAIT).

[19]  Nader Karimi,et al.  Automatic Monocular System for Human Fall Detection Based on Variations in Silhouette Area , 2013, IEEE Transactions on Biomedical Engineering.

[20]  Sharath Pankanti,et al.  Recognition of repetitive sequential human activity , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Chin-Wei Hsu,et al.  Implementation of Fall Detection System Based on 3D Skeleton for Deep Learning Technique , 2019, IEEE Access.

[22]  Matteo Matteucci,et al.  Spatial Temporal Transformer Network for Skeleton-based Action Recognition , 2020, ICPR Workshops.

[23]  Tuan V. Pham,et al.  Human fall detection based on adaptive background mixture model and HMM , 2013, 2013 International Conference on Advanced Technologies for Communications (ATC 2013).

[24]  Nicu Sebe,et al.  Spatio-Temporal Attention Networks for Action Recognition and Detection , 2020, IEEE Transactions on Multimedia.

[25]  Anand Rangarajan,et al.  Generalized graduated nonconvexity algorithm for maximum a posteriori image estimation , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[26]  Mads Nielsen Surface reconstruction: GNCs and MFA , 1995, Proceedings of IEEE International Conference on Computer Vision.

[27]  Thi-Lan Le,et al.  An analysis on human fall detection using skeleton from Microsoft kinect , 2014, 2014 IEEE Fifth International Conference on Communications and Electronics (ICCE).

[28]  Heng Yang,et al.  Graduated Non-Convexity for Robust Spatial Perception: From Non-Minimal Solvers to Global Outlier Rejection , 2020, IEEE Robotics and Automation Letters.

[29]  Miguel Hernando,et al.  Home Camera-Based Fall Detection System for the Elderly , 2017, Sensors.

[30]  Te-Feng Su,et al.  Multi-attributed Dictionary Learning for Sparse Coding , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[32]  Luigi Cinque,et al.  2-D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs , 2020, IEEE Transactions on Multimedia.

[33]  Nanning Zheng,et al.  Learning Composite Latent Structures for 3D Human Action Representation and Recognition , 2019, IEEE Transactions on Multimedia.

[34]  C. Krishna Mohan,et al.  Dictionary based action video classification with action bank , 2014, 2014 19th International Conference on Digital Signal Processing.

[35]  Yang Wang,et al.  Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Rita Noumeir,et al.  Vision-Based Fall Detection Using ST-GCN , 2021, IEEE Access.

[38]  Chunming Li,et al.  Learning Complex Spatio-Temporal Configurations of Body Joints for Online Activity Recognition , 2018, IEEE Transactions on Human-Machine Systems.

[39]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[40]  Hiroyuki Tomiyama,et al.  A Privacy Protected Fall Detection IoT System for Elderly Persons Using Depth Camera , 2018, 2018 International Conference on Advanced Mechatronic Systems (ICAMechS).

[41]  Yang Liu,et al.  Video-based Fall Detection for Seniors with Human Pose Estimation , 2018, 2018 4th International Conference on Universal Village (UV).

[42]  Weidong Min,et al.  Support vector machine approach to fall recognition based on simplified expression of human skeleton action and fast detection of start key frame using torso angle , 2018, IET Comput. Vis..

[43]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[45]  Alberto Del Bimbo,et al.  A Dictionary Learning-Based 3D Morphable Shape Model , 2017, IEEE Transactions on Multimedia.

[46]  Can Zhang,et al.  AFNet: Temporal Locality-Aware Network With Dual Structure for Accurate and Fast Action Detection , 2021, IEEE Transactions on Multimedia.

[47]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[48]  Nasser Kehtarnavaz,et al.  Deep Learning-based Human Pose Estimation: A Survey , 2020, ACM Comput. Surv..

[49]  Lin Gao,et al.  Graph CNNs with Motif and Variable Temporal Block for Skeleton-Based Action Recognition , 2019, AAAI.

[50]  Vladlen Koltun,et al.  Fast Global Registration , 2016, ECCV.

[51]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.