Indoor human activity recognition using high-dimensional sensors and deep neural networks

Many smart home applications rely on indoor human activity recognition. This challenge is currently primarily tackled by employing video camera sensors. However, the use of such sensors is characterized by fundamental technical deficiencies in an indoor environment, often also resulting in a breach of privacy. In contrast, a radar sensor resolves most of these flaws and maintains privacy in particular. In this paper, we investigate a novel approach toward automatic indoor human activity recognition, feeding high-dimensional radar and video camera sensor data into several deep neural networks. Furthermore, we explore the efficacy of sensor fusion to provide a solution in less than ideal circumstances. We validate our approach on two newly constructed and published data sets that consist of 2347 and 1505 samples distributed over six different types of gestures and events, respectively. From our analysis, we can conclude that, when considering a radar sensor, it is optimal to make use of a three-dimensional convolutional neural network that takes as input sequential range-Doppler maps. This model achieves 12.22% and 2.97% error rate on the gestures and the events data set, respectively. A pretrained residual network is employed to deal with the video camera sensor data and obtains 1.67% and 3.00% error rate on the same data sets. We show that there exists a clear benefit in combining both sensors to enable activity recognition in the case of less than ideal circumstances.

[1]  H. Wechsler,et al.  Micro-Doppler effect in radar: phenomenon, model, and simulation study , 2006, IEEE Transactions on Aerospace and Electronic Systems.

[2]  Yutaka Satoh,et al.  Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  B. V. K. Vijaya Kumar,et al.  A multi-sensor fusion system for moving object detection and tracking in urban driving environments , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jian Bai,et al.  Fusion of millimeter wave radar and RGB-depth sensors for assisted navigation of the visually impaired , 2018, Security + Defence.

[9]  Francesco Fioranelli,et al.  Multistatic micro-Doppler radar feature extraction for classification of unloaded/loaded micro-drones , 2017 .

[10]  Mohan M. Trivedi,et al.  On surveillance for safety critical events: In-vehicle video networks for predictive driver assistance systems , 2015, Comput. Vis. Image Underst..

[11]  Ivan Poupyrev,et al.  Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum , 2016, UIST.

[12]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Youngwook Kim,et al.  Hand Gesture Recognition Using Micro-Doppler Signatures With Convolutional Neural Network , 2016, IEEE Access.

[14]  Sander Dieleman,et al.  Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video , 2015, International Journal of Computer Vision.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Tom Dhaene,et al.  Structured Inference Networks Using High-Dimensional Sensors for Surveillance Purposes , 2018, EANN.

[17]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[18]  Francesco Fioranelli,et al.  Classification of Unarmed/Armed Personnel Using the NetRAD Multistatic Radar for Micro-Doppler and Singular Value Decomposition Features , 2015, IEEE Geoscience and Remote Sensing Letters.

[19]  André Bourdoux,et al.  Indoor Person Identification Using a Low-Power FMCW Radar , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Antonio Torralba,et al.  Through-Wall Human Pose Estimation Using Radio Signals , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[23]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Meng Wu,et al.  Fall Detection Based on Sequential Modeling of Radar Signal Time-Frequency Features , 2013, 2013 IEEE International Conference on Healthcare Informatics.

[26]  Liang Liu,et al.  Automatic fall detection based on Doppler radar motion signature , 2011, 2011 5th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops.

[27]  Mehrtash Tafazzoli Harandi,et al.  Going deeper into action recognition: A survey , 2016, Image Vis. Comput..

[28]  IEEE conference on computer vision and pattern recognition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[29]  Karl Woodbridge,et al.  Activity recognition based on micro-Doppler signature with in-home Wi-Fi , 2016, 2016 IEEE 18th International Conference on e-Health Networking, Applications and Services (Healthcom).

[30]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Carmine Clemente,et al.  Micro-doppler-based in-home aided and unaided walking recognition with multiple radar and sonar systems , 2017 .

[32]  Jri Lee,et al.  A Fully-Integrated 77-GHz FMCW Radar Transceiver in 65-nm CMOS Technology , 2010, IEEE Journal of Solid-State Circuits.

[33]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[34]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[35]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[36]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Qing Lei,et al.  A Comprehensive Survey of Vision-Based Human Action Recognition Methods , 2019, Sensors.

[38]  Xian Wu,et al.  Study on Target Tracking Based on Vision and Radar Sensor Fusion , 2018 .