A deep neural network for real-time detection of falling humans in naturally occurring scenes

Abstract We introduce a novel approach to the problem of human fall detection in naturally occurring scenes. This is important because falling incidents cause thousands of deaths every year and vision-based approaches offer a promising and effective way to detect falls. To address this challenging issue, we regard it as an example of action detection and propose to also locate its temporal extent. We achieve this by exploiting the effectiveness of deep networks. In the training stage, the trimmed video clips of four phases (standing, falling, fallen and not moving) in a fall are converted to four categories of so-called dynamic image to train a deep ConvNet that scores and predicts the label of each dynamic image. In the testing stage, a set of sub-videos is generated using a sliding window on an untrimmed video that converts it to multiple dynamic images. Based on the predicted label of each dynamic image by the trained deep ConvNet, the videos are classified as falling or not by a “standing watch” for a situation consisting of the four sequential phases. In order to localize the temporal extent of the event, we propose a difference score method (DSM) based on adjacent dynamic images in the temporal sequence. We collect a new dataset, called the YouTube Fall Dataset (YTFD), which contains 430 falling incidents and 176 normal activities and use it to learn the deep network to detect falling humans. We perform experiments on datasets of varying complexity: Le2i fall detection dataset, multiple cameras fall dataset, high quality fall simulation dataset and our own YouTube Fall Dataset. The results demonstrate the effectiveness and efficiency of our approach.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Tinne Tuytelaars,et al.  Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Davide Anguita,et al.  Transition-Aware Human Activity Recognition Using Smartphones , 2016, Neurocomputing.

[5]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[6]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jean Meunier,et al.  Robust Video Surveillance for Fall Detection Based on Human Shape Deformation , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Nicolas Thome,et al.  A Real-Time, Multiview Fall Detection System: A LHMM-Based Approach , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Tinne Tuytelaars,et al.  Rank Pooling for Action Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Cordelia Schmid,et al.  Learning to Track for Spatio-Temporal Action Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Franck Multon,et al.  Fall Detection With Multiple Cameras: An Occlusion-Resistant Method Based on 3-D Silhouette Vertical Distribution , 2011, IEEE Transactions on Information Technology in Biomedicine.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Pinar Duygulu Sahin,et al.  What Is Usual in Unusual Videos? Trajectory Snippet Histograms for Discovering Unusualness , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[14]  Mick Ballesteros,et al.  161 An evaluation of CDC’s web-based injury statistics query and reporting system (WISQARS) , 2016 .

[15]  Rui Liu,et al.  An efficient pixel-wise method for moving object detection in complex scenes , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[16]  Bart Vanrumste,et al.  Camera-based fall detection using real-world versus simulated data: How far are we from the solution? , 2016, J. Ambient Intell. Smart Environ..

[17]  S. K. Tasoulis,et al.  Statistical data mining of streaming motion data for activity and fall recognition in assistive environments , 2013, Neurocomputing.

[18]  Cordelia Schmid,et al.  Temporal Localization of Actions with Actoms. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[19]  Ling Shao,et al.  A survey on fall detection: Principles and approaches , 2013, Neurocomputing.

[20]  Xinbo Gao,et al.  Multi-task human action recognition via exploring super-category , 2016, Signal Process..

[21]  Bart Vanrumste,et al.  Bridging the gap between real-life data and simulated data by providing a highly realistic fall dataset for evaluating camera-based fall detection algorithms. , 2016, Healthcare technology letters.

[22]  G. ÓLaighin,et al.  A proposal for the classification and evaluation of fall detectors Une proposition pour la classification et l'évaluation des détecteurs de chutes , 2008 .

[23]  Bogdan Kwolek,et al.  Improving fall detection by the use of depth sensor and accelerometer , 2015, Neurocomputing.

[24]  Rached Tourki,et al.  Definition and Performance Evaluation of a Robust SVM Based Fall Detection Solution , 2012, 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems.

[25]  Mohamed Atri,et al.  Definition and performance evaluation of a robust SVM based fall dectection system , 2012 .

[26]  Jiwen Lu,et al.  PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[27]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[29]  Shih-Fu Chang,et al.  Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Wei Liu,et al.  Latent Max-Margin Multitask Learning With Skelets for 3-D Action Recognition , 2017, IEEE Transactions on Cybernetics.

[31]  Ramakant Nevatia,et al.  Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images , 2015, ACM Multimedia.

[32]  Rui Liu,et al.  Fall detection for elderly person care in a vision-based home surveillance environment using a monocular camera , 2014, Signal Image Video Process..

[33]  Kostas Karpouzis,et al.  Fall detection using history triple features , 2015, PETRA.

[34]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[35]  Andrea Vedaldi,et al.  Dynamic Image Networks for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Vassilis Athitsos,et al.  A survey on vision-based fall detection , 2015, PETRA.

[37]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[38]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Limin Wang,et al.  Multi-view Super Vector for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Long Chen,et al.  Human fall detection in surveillance video based on PCANet , 2016, Multimedia Tools and Applications.

[41]  Zhe Wang,et al.  Towards Good Practices for Very Deep Two-Stream ConvNets , 2015, ArXiv.

[42]  S. Deans The Radon Transform and Some of Its Applications , 1983 .

[43]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[44]  Ilias Maglogiannis,et al.  Emergency Fall Incidents Detection in Assisted Living Environments Utilizing Motion, Sound, and Visual Perceptual Components , 2011, IEEE Transactions on Information Technology in Biomedicine.

[45]  Chung-Lin Huang,et al.  Slip and fall event detection using Bayesian Belief Network , 2012, Pattern Recognit..

[46]  Ali Farhadi,et al.  Actions ~ Transformations , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Marc Van Droogenbroeck,et al.  ViBe: A Universal Background Subtraction Algorithm for Video Sequences , 2011, IEEE Transactions on Image Processing.

[48]  Haibo Wang,et al.  Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine , 2014, IEEE Journal of Biomedical and Health Informatics.

[49]  Marjorie Skubic,et al.  Fall Detection in Homes of Older Adults Using the Microsoft Kinect , 2015, IEEE Journal of Biomedical and Health Informatics.

[50]  Luc Van Gool,et al.  Actionness Estimation Using Hybrid Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Jitendra Malik,et al.  Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Irene Y. H. Gu,et al.  Human fall detection in videos by fusing statistical features of shape and motion dynamics on Riemannian manifolds , 2016, Neurocomputing.

[54]  Fei-Fei Li,et al.  Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Wei Liu,et al.  Discriminative Multi-instance Multitask Learning for 3D Action Recognition , 2017, IEEE Transactions on Multimedia.