A spatio-temporal deep learning approach for human action recognition in infrared videos

Human action recognition in indoor environment can prove to be very crucial in avoiding serious accidents and (or) damage. Application domain spans from monitoring the actions of solitary elders or persons with disabilities to monitoring persons working alone in a chamber or in isolated industry environment. These scenarios demand an automatic near real-time activity recognition and alert to save life and assets. In this work, considering the fact that the sensing modality should be capable of working round the clock in a non-intrusive manner, we have opted for thermal infrared camera, which captures the heat emitted by objects in the scene and generates an image. Motivated by the recent success of convolutional neural networks (CNN) for human action recognition in IR images, we extend this work by incorporating one additional dimension i.e. the temporal information. In this work, we have designed and implemented a 3D-CNN for learning the spatial as well as the sequential features in the thermal IR videos. In this work, eight action classes are considered - Walking, Standing, Falling, Lying, Sitting, Falling from chair, Sitting up (recovering from fall from sitting posture), Getting up (recovering from fall from lying posture). To evaluate the proposed framework, infrared (IR) videos of different actions were generated in three diverse environments of home – inside study room, inside a bedroom and in the garden. The dataset comprised of 2641 and 894 IR videos for training and testing respectively, each of half a second duration performed by more than 50 volunteers. We have designed and implemented 3D-CNN, comprising of two blocks, each of two convolution and one max pool layer, which automatically constructs features from raw data incorporating both spatial and temporal information to learn actions. Network parameters are learned using back-propagation algorithm and the learning is supervised. Experimental results indicate 85% classification accuracy on 894 complex test videos of the proposed Spatio-Temporal Deep Learning architecture on the IR action dataset.

[1]  Miao Yu,et al.  An Online One Class Support Vector Machine-Based Person-Specific Fall Detection System for Monitoring an Elderly Individual in a Room Environment , 2013, IEEE Journal of Biomedical and Health Informatics.

[2]  M. Skubic,et al.  Older adults' attitudes towards and perceptions of ‘smart home’ technologies: a pilot study , 2004, Medical informatics and the Internet in medicine.

[3]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Bir Bhanu,et al.  Human Activity Recognition in Thermal Infrared Imagery , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[5]  Rama Chellappa,et al.  Sparsity-motivated automatic target recognition. , 2011, Applied optics.

[6]  Carl Graf Hoyos,et al.  Occupational Safety and Accident Prevention: Behavioral Strategies and Methods , 2014 .

[7]  ByoungChul Ko,et al.  Recognizing pedestrian’s unsafe behaviors in far-infrared imagery at night , 2016 .

[8]  Dimitrios Makris,et al.  Fall detection system using Kinect’s infrared sensor , 2014, Journal of Real-Time Image Processing.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Meng Wang,et al.  3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks , 2014, ACM Multimedia.

[11]  Marjorie Skubic,et al.  Evaluation of an inexpensive depth camera for in-home gait assessment , 2011, J. Ambient Intell. Smart Environ..

[12]  Haibo Wang,et al.  Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine , 2014, IEEE Journal of Biomedical and Health Informatics.

[13]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Marjorie Skubic,et al.  Unobtrusive, Continuous, In-Home Gait Measurement Using the Microsoft Kinect , 2013, IEEE Transactions on Biomedical Engineering.

[15]  Nadia Magnenat-Thalmann,et al.  Fall Detection Based on Body Part Tracking Using a Depth Camera , 2015, IEEE Journal of Biomedical and Health Informatics.

[16]  Marjorie Skubic,et al.  Fall Detection in Homes of Older Adults Using the Microsoft Kinect , 2015, IEEE Journal of Biomedical and Health Informatics.

[17]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Martin Kampel,et al.  Introducing the use of depth data for fall detection , 2013, Personal and Ubiquitous Computing.

[19]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[20]  Nigel H. Lovell,et al.  Simulated Unobtrusive Falls Detection With Multiple Persons , 2012, IEEE Transactions on Biomedical Engineering.

[21]  Matteo Munaro,et al.  Performance evaluation of the 1st and 2nd generation Kinect for multimedia applications , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[22]  Ripul Ghosh,et al.  Moving target detection in thermal infrared imagery using spatiotemporal information. , 2013, Journal of the Optical Society of America. A, Optics, image science, and vision.

[23]  Yi Yang,et al.  A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Alex Mihailidis,et al.  A Survey on Ambient-Assisted Living Tools for Older Adults , 2013, IEEE Journal of Biomedical and Health Informatics.

[25]  Franck Multon,et al.  Fall Detection With Multiple Cameras: An Occlusion-Resistant Method Based on 3-D Silhouette Vertical Distribution , 2011, IEEE Transactions on Information Technology in Biomedicine.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  E. Finkelstein,et al.  The costs of fatal and non-fatal falls among older adults , 2006, Injury Prevention.

[28]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Ripul Ghosh,et al.  Deep learning approach for human action recognition in infrared images , 2018, Cognitive Systems Research.