Deep Learning for Fall Detection: Three-Dimensional CNN Combined With LSTM on Video Kinematic Data

Fall detection is an important public healthcare problem. Timely detection could enable instant delivery of medical service to the injured. A popular nonintrusive solution for fall detection is based on videos obtained through ambient camera, and the corresponding methods usually require a large dataset to train a classifier and are inclined to be influenced by the image quality. However, it is hard to collect fall data and instead simulated falls are recorded to construct the training dataset, which is restricted to limited quantity. To address these problems, a three-dimensional convolutional neural network (3-D CNN) based method for fall detection is developed, which only uses video kinematic data to train an automatic feature extractor and could circumvent the requirement for large fall dataset of deep learning solution. 2-D CNN could only encode spatial information, and the employed 3-D convolution could extract motion feature from temporal sequence, which is important for fall detection. To further locate the region of interest in each frame, a long short-term memory (LSTM) based spatial visual attention scheme is incorporated. Sports dataset Sports-1 M with no fall examples is employed to train the 3-D CNN, which is then combined with LSTM to train a classifier with fall dataset. Experiments have verified the proposed scheme on fall detection benchmark with high accuracy as 100%. Superior performance has also been obtained on other activity databases.

[1]  Ruslan Salakhutdinov,et al.  Action Recognition using Visual Attention , 2015, NIPS 2015.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Trevor Darrell,et al.  Learning the Structure of Deep Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[8]  Rached Tourki,et al.  Definition and Performance Evaluation of a Robust SVM Based Fall Detection Solution , 2012, 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems.

[9]  Nader Karimi,et al.  Automatic Monocular System for Human Fall Detection Based on Variations in Silhouette Area , 2013, IEEE Transactions on Biomedical Engineering.

[10]  Stephen J. McKenna,et al.  Activity summarisation and fall detection in a supportive home environment , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[11]  Loong Fah Cheong,et al.  Two-Stream Flow-Guided Convolutional Attention Networks for Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[12]  Lei Yang,et al.  3D depth image analysis for indoor fall detection of elderly people , 2016, Digit. Commun. Networks.

[13]  Huosheng Hu,et al.  New Fast Fall Detection Method Based on Spatio-Temporal Context Tracking of Head by Using Depth Images , 2015, Sensors.

[14]  Franco Scarselli,et al.  On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Cees Snoek,et al.  VideoLSTM convolves, attends and flows for action recognition , 2016, Comput. Vis. Image Underst..

[16]  Xi Wang,et al.  Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification , 2015, ACM Multimedia.

[17]  Shuwan Xue,et al.  Portable Preimpact Fall Detector With Inertial Sensors , 2008, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[18]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[19]  Ling Shao,et al.  A survey on fall detection: Principles and approaches , 2013, Neurocomputing.

[20]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Vassilis Athitsos,et al.  A survey on vision-based fall detection , 2015, PETRA.

[23]  Jasmine C Menant,et al.  Vision and falls in older people: risk factors and intervention strategies. , 2010, Clinics in geriatric medicine.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[28]  Marjorie Skubic,et al.  Fall Detection in Homes of Older Adults Using the Microsoft Kinect , 2015, IEEE Journal of Biomedical and Health Informatics.

[29]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  M. Popescu,et al.  Acoustic fall detection using one-class classifiers , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[31]  Giuseppe De Pietro,et al.  A supervised approach to automatically extract a set of rules to support fall detection in an mHealth system , 2015, Appl. Soft Comput..

[32]  Carlos Ricolfe-Viala,et al.  Fall detection based on the gravity vector using a wide-angle camera , 2014, Expert Syst. Appl..

[33]  Jean Meunier,et al.  Robust Video Surveillance for Fall Detection Based on Human Shape Deformation , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[36]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Toshiyo Tamura,et al.  A Wearable Airbag to Prevent Fall Injuries , 2009, IEEE Transactions on Information Technology in Biomedicine.

[39]  Inmaculada Plaza,et al.  Challenges, issues and trends in fall detection systems , 2013, Biomedical engineering online.

[40]  Surapa Thiemjarus,et al.  Automatic Fall Monitoring: A Review , 2014, Sensors.

[41]  Lih-Jen Kau,et al.  A Smart Phone-Based Pocket Fall Accident Detection, Positioning, and Rescue System , 2015, IEEE Journal of Biomedical and Health Informatics.

[42]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[43]  Stephen J. McKenna,et al.  Activity summarisation and fall detection in a supportive home environment , 2004, ICPR 2004.

[44]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Nicolas Thome,et al.  A Real-Time, Multiview Fall Detection System: A LHMM-Based Approach , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[46]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[47]  M. Alwan,et al.  A Smart and Passive Floor-Vibration Based Fall Detector for Elderly , 2006, 2006 2nd International Conference on Information & Communication Technologies.

[48]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[49]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[50]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[52]  James M. Keller,et al.  Linguistic summarization of video for fall detection using voxel person and fuzzy logic , 2009, Comput. Vis. Image Underst..

[53]  Tie-Yan Liu,et al.  On the Depth of Deep Neural Networks: A Theoretical View , 2015, AAAI.

[54]  Zhihai He,et al.  Recognizing Falls from Silhouettes , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[55]  Hong Liu,et al.  Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition , 2017, ArXiv.