Fall Detection in Videos With Trajectory-Weighted Deep-Convolutional Rank-Pooling Descriptor

Automatic fall detection in videos could enable timely delivery of medical service to the injured elders who have fallen and live alone. Deep ConvNets have been used to detect fall actions. However, there still remain problems in deep video representations for fall detection. First, video frames are directly inputted to deep ConvNets. The visual features of human actions may be interfered with surrounding environments. Second, redundant frames increase the difficulty of time encoding for human actions. To address these problems, this paper presents trajectory-weighted deep-convolutional rank-pooling descriptor (TDRD) for fall detection, which is robust to surrounding environments and can describe the dynamics of human actions in long time videos effectively. First, CNN feature map of each frame is extracted through a deep ConvNet. Then, we present a new kind of trajectory attention map which is built with improved dense trajectories to optimally localize the subject area. Next, the CNN feature map of each frame is weighted with its corresponding trajectory attention map to get trajectory-weighted convolutional visual feature of human region. Further, we propose a cluster pooling method to reduce the redundancy of the trajectory-weighted convolutional features of a video in the time sequence. Finally, rank pooling method is used to encode the dynamic of the cluster-pooled sequence to get our TDRD. With TDRD, we get superior result on SDUFall dataset and get comparable performances on UR dataset and Multiple cameras dataset with SVM classifiers.

[1]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Vassilis Athitsos,et al.  A survey on vision-based fall detection , 2015, PETRA.

[3]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[4]  Li Feng,et al.  Deep Learning for Fall Detection: Three-Dimensional CNN Combined With LSTM on Video Kinematic Data , 2019, IEEE Journal of Biomedical and Health Informatics.

[5]  Haibo Wang,et al.  Silhouette Orientation Volumes for Efficient Fall Detection in Depth Videos , 2017, IEEE Journal of Biomedical and Health Informatics.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Nicolas Thome,et al.  A Real-Time, Multiview Fall Detection System: A LHMM-Based Approach , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Clare Griffiths,et al.  Leading causes of death in England and Wales--how should we group causes? , 2005, Health statistics quarterly.

[9]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  James M. Keller,et al.  Linguistic summarization of video for fall detection using voxel person and fuzzy logic , 2009, Comput. Vis. Image Underst..

[11]  Ping Wang,et al.  Human fall detection using slow feature analysis , 2019, Multimedia Tools and Applications.

[12]  Irene Y. H. Gu,et al.  Human fall detection in videos by fusing statistical features of shape and motion dynamics on Riemannian manifolds , 2016, Neurocomputing.

[13]  Yu-Lin Jeng,et al.  Development of Home Intelligent Fall Detection IoT System Based on Feedback Optical Flow Convolutional Neural Network , 2018, IEEE Access.

[14]  Lin Sun,et al.  Lattice Long Short-Term Memory for Human Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Deva Ramanan,et al.  Attentional Pooling for Action Recognition , 2017, NIPS.

[16]  Stephen J. McKenna,et al.  Activity summarisation and fall detection in a supportive home environment , 2004, ICPR 2004.

[17]  Bogdan Kwolek,et al.  Human fall detection on embedded platform using depth maps and wireless accelerometer , 2014, Comput. Methods Programs Biomed..

[18]  Ruslan Salakhutdinov,et al.  Action Recognition using Visual Attention , 2015, NIPS 2015.

[19]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[20]  Irene Y. H. Gu,et al.  Human fall detection in videos via boosting and fusing statistical features of appearance, shape and motion dynamics on Riemannian manifolds with applications to assisted living , 2016, Comput. Vis. Image Underst..

[21]  Tinne Tuytelaars,et al.  Rank Pooling for Action Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Nadia Baha,et al.  Depth camera based fall detection using human shape and movement , 2016, 2016 IEEE International Conference on Signal and Image Processing (ICSIP).

[23]  Haibo Wang,et al.  Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine , 2014, IEEE Journal of Biomedical and Health Informatics.

[24]  Weidong Min,et al.  Detection of Human Falls on Furniture Using Scene Analysis Based on Deep Learning and Activity Characteristics , 2018, IEEE Access.

[25]  Shehroz S. Khan,et al.  DeepFall: Non-Invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders , 2019, Journal of Healthcare Informatics Research.

[26]  Chittaranjan A. Mandal,et al.  Automatic Detection of Human Fall in Video , 2007, PReMI.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[29]  Jean Meunier,et al.  Robust Video Surveillance for Fall Detection Based on Human Shape Deformation , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Gongjian Wen,et al.  A deep neural network for real-time detection of falling humans in naturally occurring scenes , 2017, Neurocomputing.

[31]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[32]  Cees Snoek,et al.  VideoLSTM convolves, attends and flows for action recognition , 2016, Comput. Vis. Image Underst..