Learning spatiotemporal representations for human fall detection in surveillance video

Abstract In this paper, a computer vision based framework is proposed that detects falls from surveillance videos. Firstly, we employ background subtraction and rank pooling to model spatial and temporal representations in videos, respectively. We then introduce a novel three-stream Convolutional Neural Networks as an event classifier. Silhouettes and their motion history images serve as input to the first two streams, while dynamic images whose temporal duration is equal to motion history images, are used as input to the third stream. Finally, we apply voting on the results of event classification to perform multi-camera fall detection. The main novelty of our method against the conventional ones is that high-quality spatiotemporal representations in different levels are learned to take full advantage of the appearance and motion information. Extensive experiments have been conducted on two widely used fall datasets. The results have shown to demonstrate the effectiveness of the proposed method.

[1]  Zhiguo Cao,et al.  Real-Time Detection of Fall From Bed Using a Single Depth Camera , 2019, IEEE Transactions on Automation Science and Engineering.

[2]  Alessio Vecchio,et al.  A smartphone-based fall detection system , 2012, Pervasive Mob. Comput..

[3]  Antoine Vacavant,et al.  A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos , 2014, Comput. Vis. Image Underst..

[4]  Ping Wang,et al.  Fall detection via human posture representation and support vector machine , 2017, Int. J. Distributed Sens. Networks.

[5]  Franck Multon,et al.  Fall Detection With Multiple Cameras: An Occlusion-Resistant Method Based on 3-D Silhouette Vertical Distribution , 2011, IEEE Transactions on Information Technology in Biomedicine.

[6]  Ming-Sui Lee,et al.  Online object tracking via motion-guided convolutional neural network (MGNet) , 2018, J. Vis. Commun. Image Represent..

[7]  Jiwen Lu,et al.  PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[8]  Ennio Gambi,et al.  A Depth-Based Fall Detection System Using a Kinect® Sensor , 2014, Sensors.

[9]  Andrea Vedaldi,et al.  Transactions on Pattern Analysis and Machine Intelligence 1 Action Recognition with Dynamic Image Networks , 2022 .

[10]  Cong Zhang,et al.  An inferential real-time falling posture reconstruction for Internet of healthcare things , 2017, J. Netw. Comput. Appl..

[11]  Bin Li,et al.  An enhanced fall detection system for elderly person monitoring using consumer home networks , 2014, IEEE Transactions on Consumer Electronics.

[12]  Long Chen,et al.  Human fall detection in surveillance video based on PCANet , 2016, Multimedia Tools and Applications.

[13]  Nader Karimi,et al.  Automatic Monocular System for Human Fall Detection Based on Variations in Silhouette Area , 2013, IEEE Transactions on Biomedical Engineering.

[14]  Irene Y. H. Gu,et al.  Human fall detection in videos via boosting and fusing statistical features of appearance, shape and motion dynamics on Riemannian manifolds with applications to assisted living , 2016, Comput. Vis. Image Underst..

[15]  Rita Cucchiara,et al.  Detecting Moving Objects, Ghosts, and Shadows in Video Streams , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  James M. Keller,et al.  Modeling Human Activity From Voxel Person Using Fuzzy Logic , 2009, IEEE Transactions on Fuzzy Systems.

[17]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[19]  A K Bourke,et al.  Evaluation of a threshold-based tri-axial accelerometer fall detection algorithm. , 2007, Gait & posture.

[20]  Jean Meunier,et al.  Robust Video Surveillance for Fall Detection Based on Human Shape Deformation , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Rui Liu,et al.  Fall detection for elderly person care in a vision-based home surveillance environment using a monocular camera , 2014, Signal Image Video Process..

[22]  Marc Van Droogenbroeck,et al.  ViBe: A Universal Background Subtraction Algorithm for Video Sequences , 2011, IEEE Transactions on Image Processing.

[23]  Haibo Wang,et al.  Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine , 2014, IEEE Journal of Biomedical and Health Informatics.

[24]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[26]  Andrew M. Wallace,et al.  Long-term Correlation Tracking using Multi-layer Hybrid Features in Sparse and Dense Environments , 2017, J. Vis. Commun. Image Represent..

[27]  Gongjian Wen,et al.  A deep neural network for real-time detection of falling humans in naturally occurring scenes , 2017, Neurocomputing.

[28]  Bogdan Kwolek,et al.  Human fall detection on embedded platform using depth maps and wireless accelerometer , 2014, Comput. Methods Programs Biomed..

[29]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[30]  Ling Shao,et al.  A survey on fall detection: Principles and approaches , 2013, Neurocomputing.