Human fall-down event detection based on 2D skeletons and deep learning approach

The goal of this research is to apply the state-of-the-art deep learning approach to human fall-down event detection based on 2D skeletons extracted from RGB sequence. In this paper, we adopt convolutional neural network (CNN) to extract humans' 2D skeletons for each input frame and then employ recurrent neural network (RNN) with Long Short Term Memory (LSTM) state cells to process temporal skeleton series to make best use of not only spatial features but also temporal information to classify each short-term action to five categories, i.e., standing, walking, falling, lying, and rising. After simple rule processing, the consecutive RNN outputs can be used to detect human's long-term actions (falling down event) and determine whether to issue an alarm or not. The accuracy of classification into 5 sub-actions is capable of achieving 90%. Our contributions lie on two aspects: (1) improving the performance on short-term human action recognition based on the combination of CNN and RNN/LSTM, (2) excluding the fall-down events that actually need no help and achieving a lower false alarm rate.

[1]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ennio Gambi,et al.  A Human Activity Recognition System Using Skeleton Data from RGBD Sensors , 2016, Comput. Intell. Neurosci..

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Ha Hoang Kha,et al.  An efficient star skeleton extraction for human action recognition using hidden Markov models , 2016, 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE).

[5]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[6]  Yong Du,et al.  Skeleton based action recognition with convolutional neural network , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[7]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Varun Ramakrishna,et al.  Pose Machines: Articulated Pose Estimation via Inference Machines , 2014, ECCV.