HEAR: Human Action Recognition via Neural Networks on Homomorphically Encrypted Data

Remote monitoring to support “aging in place” is an active area of research. Advanced computer vision technology based on deep learning can provide near real-time home monitoring to detect falling and symptoms related to seizure, and stroke. Affordable webcams, together with cloud computing services (to run machine learning algorithms), can potentially bring significant social and health benefits. However, it has not been deployed in practice because of privacy and security concerns. People may feel uncomfortable sending their videos of daily activities (with potentially sensitive private information) to a computing service provider (e.g., on a commercial cloud). In this paper, we propose a novel strategy to resolve this dilemma by applying fully homomorphic encryption (FHE) to an alternative representation of human actions (i.e., skeleton joints), which guarantees information confidentiality while retaining high-performance action detection at a low cost. We design an FHE-friendly neural network for action recognition and present a secure neural network evaluation strategy to achieve near real-time action detection. Our framework for private inference achieves an 87.99% recognition accuracy (86.21% sensitivity and 99.14% specificity in detecting falls) with a latency of 3.1 seconds on real-world datasets. Our evaluation shows that our elaborated and fine-tuned method reduces the inference latency by 23.81%∼74.67% over a straightforward implementation.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[3]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Nanning Zheng,et al.  View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Yao Lu,et al.  Oblivious Neural Network Predictions via MiniONN Transformations , 2017, IACR Cryptol. ePrint Arch..

[7]  Hao Chen,et al.  CHET: an optimizing compiler for fully-homomorphic neural-network inferencing , 2019, PLDI.

[8]  Nicolas Gama,et al.  Faster Fully Homomorphic Encryption: Bootstrapping in Less Than 0.1 Seconds , 2016, ASIACRYPT.

[9]  Pascal Paillier,et al.  Fast Homomorphic Evaluation of Deep Discretized Neural Networks , 2018, IACR Cryptol. ePrint Arch..

[10]  Martin R. Albrecht,et al.  On the concrete hardness of Learning with Errors , 2015, J. Math. Cryptol..

[11]  Austin Reiter,et al.  Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Satoshi Nakamura,et al.  Make Skeleton-based Action Recognition Model Smaller, Faster and Better , 2019, MMAsia.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[15]  Anantha Chandrakasan,et al.  Gazelle: A Low Latency Framework for Secure Neural Network Inference , 2018, IACR Cryptol. ePrint Arch..

[16]  Rosario Cammarota,et al.  nGraph-HE2: A High-Throughput Framework for Neural Network Inference on Encrypted Data , 2019, IACR Cryptol. ePrint Arch..

[17]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lei Jiang,et al.  SHE: A Fast and Accurate Deep Neural Network for Encrypted Data , 2019, NeurIPS.

[19]  Shai Halevi,et al.  Faster Homomorphic Linear Transformations in HElib , 2018, IACR Cryptol. ePrint Arch..

[20]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[21]  Wei Dai,et al.  EVA: an encrypted vector arithmetic language and compiler for efficient homomorphic computation , 2019, PLDI.

[22]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Cristóbal Curio,et al.  Simple yet efficient real-time pose-based action recognition , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[24]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[26]  Xiaoqian Jiang,et al.  Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation , 2018, IACR Cryptol. ePrint Arch..

[27]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[28]  Jung Hee Cheon,et al.  A Full RNS Variant of Approximate Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[29]  Mauro Barni,et al.  A privacy-preserving protocol for neural-network-based computation , 2006, MM&Sec '06.

[30]  C. N. Scanaill,et al.  A Review of Approaches to Mobility Telemonitoring of the Elderly in Their Living Environment , 2006, Annals of Biomedical Engineering.

[31]  Mauro Barni,et al.  Oblivious Neural Network Computing via Homomorphic Encryption , 2007, EURASIP J. Inf. Secur..

[32]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[33]  Ran Gilad-Bachrach,et al.  Low Latency Privacy Preserving Inference , 2018, ICML.

[34]  Jung Hee Cheon,et al.  Logistic regression model training based on the approximate homomorphic encryption , 2018, BMC Medical Genomics.

[35]  Mohammed Bennamoun,et al.  A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  J. Morley,et al.  Aging in Place , 2020, Manoa.

[37]  Jon Leachtenauer,et al.  Impact of monitoring technology in assisted living: outcome pilot , 2006, IEEE Transactions on Information Technology in Biomedicine.

[38]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[39]  Shai Halevi,et al.  Algorithms in HElib , 2014, CRYPTO.

[40]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Bogdan Kwolek,et al.  Human fall detection on embedded platform using depth maps and wireless accelerometer , 2014, Comput. Methods Programs Biomed..

[42]  Thomas Brox,et al.  Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Tieniu Tan,et al.  Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning , 2018, ECCV.

[45]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Yong Du,et al.  Skeleton based action recognition with convolutional neural network , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[47]  Daisuke Miyashita,et al.  LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Li Fei-Fei,et al.  Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference , 2018, ArXiv.