Human Action Recognition from Various Data Modalities: A Review

Human Action Recognition (HAR), aiming to understand human behaviors and then assign category labels, has a wide range of applications, and thus has been attracting increasing attention in the field of computer vision. Generally, human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared sequence, point cloud, event stream, audio, acceleration, radar, and WiFi, etc., which encode different sources of useful yet distinct information and have various advantages and application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this paper, we give a comprehensive survey for HAR from the perspective of the input data modalities. Specifically, we review both the hand-crafted feature-based and deep learning-based methods for single data modalities, and also review the methods based on multiple modalities, including the fusion-based frameworks and the co-learning-based approaches. The current benchmark datasets for HAR are also introduced. Finally, we discuss some potentially important research directions in this

[1]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[2]  Yun Fu,et al.  Max-Margin Heterogeneous Information Machine for RGB-D Action Recognition , 2017, International Journal of Computer Vision.

[3]  Changsheng Xu,et al.  I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs , 2019, AAAI.

[4]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[5]  Ling Guan,et al.  Multimodal Learning for Human Action Recognition Via Bimodal/Multimodal Hybrid Centroid Canonical Correlation Analysis , 2019, IEEE Transactions on Multimedia.

[6]  Paul J. M. Havinga,et al.  A Survey of Online Activity Recognition Using Mobile Phones , 2015, Sensors.

[7]  Dimitris Kastaniotis,et al.  Pose-based human action recognition via sparse representation in dissimilarity space , 2014, J. Vis. Commun. Image Represent..

[8]  Michael L. Littman,et al.  Activity Recognition from Accelerometer Data , 2005, AAAI.

[9]  Ayman Atia,et al.  Survey on Human Activity Recognition based on Acceleration Data , 2019, International Journal of Advanced Computer Science and Applications.

[10]  Dilip K. Prasad,et al.  Semi-CNN Architecture for Effective Spatio-Temporal Learning in Action Recognition , 2019, Applied Sciences.

[11]  Qing Lei,et al.  A Comprehensive Survey of Vision-Based Human Action Recognition Methods , 2019, Sensors.

[12]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[13]  Tae-Hyun Oh,et al.  Listen to Look: Action Recognition by Previewing Audio , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Pichao Wang,et al.  Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks , 2016, ACM Multimedia.

[15]  Vittorio Murino,et al.  Modality Distillation with Multiple Stream Networks for Action Recognition , 2018, ECCV.

[16]  Bin Tong,et al.  MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Nanning Zheng,et al.  Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Limin Wang,et al.  Computer Vision and Image Understanding Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice , 2022 .

[19]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[20]  Yiannis Andreopoulos,et al.  Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing , 2019, ArXiv.

[21]  Cheng Dai,et al.  Human action recognition using two-stream attention based LSTM networks , 2020, Appl. Soft Comput..

[22]  Petia Radeva,et al.  Human Activity Recognition from Accelerometer Data Using a Wearable Device , 2011, IbPRIA.

[23]  Xiaohui Peng,et al.  Deep Learning for Sensor-based Activity Recognition: A Survey , 2017, Pattern Recognit. Lett..

[24]  Abhinav Dhall,et al.  Motion and Region Aware Adversarial Learning for Fall Detection with Thermal Imaging , 2020, ArXiv.

[25]  Dinesh Kumar Vishwakarma,et al.  View-Invariant Deep Architecture for Human Action Recognition Using Two-Stream Motion and Shape Temporal Dynamics , 2020, IEEE Transactions on Image Processing.

[26]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jiaying Liu,et al.  PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding , 2017, ArXiv.

[29]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[31]  Lei Gao,et al.  A Spatiotemporal Heterogeneous Two-Stream Network for Action Recognition , 2019, IEEE Access.

[32]  Petros Daras,et al.  Real-Time Skeleton-Tracking-Based Human Action Recognition Using Kinect Data , 2014, MMM.

[33]  Lorenzo Torresani,et al.  Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization , 2018, NeurIPS.

[34]  Xi Wang,et al.  Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification , 2015, ACM Multimedia.

[35]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..

[36]  Seong-Whan Lee,et al.  View-independent human action recognition with Volume Motion Template on single stereo camera , 2010, Pattern Recognit. Lett..

[37]  Zheru Chi,et al.  Realistic Human Action Recognition With Multimodal Feature Selection and Fusion , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[38]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Syed Aziz Shah,et al.  Human Activity Recognition : Preliminary Results for Dataset Portability using FMCW Radar , 2019, 2019 International Radar Conference (RADAR).

[40]  Yi Zhu,et al.  Deep Local Video Feature for Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  L. Cifola,et al.  Multi-target human gait classification using deep convolutional neural networks on micro-doppler spectrograms , 2016, 2016 European Radar Conference (EuRAD).

[42]  Heng Tao Shen,et al.  Beyond Frame-level CNN: Saliency-Aware 3-D CNN With LSTM for Video Action Recognition , 2017, IEEE Signal Processing Letters.

[43]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Ajmal Mian,et al.  3D Action Recognition from Novel Viewpoints , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jingjing Meng,et al.  Dynamic Graph CNN for Event-Camera Based Gesture Recognition , 2020, International Symposium on Circuits and Systems.

[46]  Yiannis Andreopoulos,et al.  Neuromorphic Vision Sensing for CNN-based Action Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47]  Reza Safabakhsh,et al.  Correlational Convolutional LSTM for human action recognition , 2020, Neurocomputing.

[48]  Mohammad Javad Rashti,et al.  Human Action Recognition in Video Using DB-LSTM and ResNet , 2020, 2020 6th International Conference on Web Research (ICWR).

[49]  Nanning Zheng,et al.  View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[51]  Michael Harville,et al.  Fast, integrated person tracking and activity recognition with plan-view templates from a single stereo camera , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[52]  Karol J. Piczak ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.

[53]  Adrian Sanchez-Caballero,et al.  Exploiting the ConvLSTM: Human Action Recognition using Raw Depth Video-Based Recurrent Neural Networks , 2020, ArXiv.

[54]  Miguel A. Labrador,et al.  A Survey on Human Activity Recognition using Wearable Sensors , 2013, IEEE Communications Surveys & Tutorials.

[55]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Nasser Kehtarnavaz,et al.  Action Recognition from Depth Sequences Using Depth Motion Maps-Based Local Binary Patterns , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[57]  Ali Farhadi,et al.  Actions ~ Transformations , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[59]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[60]  Gang Wang,et al.  Skeleton-Based Online Action Prediction Using Scale Selection Network , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Mario Bizzini,et al.  Concurrent validity and intrasession reliability of the IDEEA accelerometry system for the quantification of spatiotemporal gait parameters. , 2008, Gait & posture.

[62]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[63]  Yu Liu,et al.  T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction , 2018, IEEE Transactions on Intelligent Transportation Systems.

[64]  Yansong Tang,et al.  Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Albert Dipanda,et al.  3D Point Cloud Descriptor for Posture Recognition , 2018, VISIGRAPP.

[66]  Gang Wang,et al.  Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Zhenghao Chen,et al.  Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Xiaopeng Hong,et al.  Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching , 2019, AAAI.

[70]  Tapio Seppänen,et al.  Recognizing human motion with multiple acceleration sensors , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[71]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[72]  Dario Maio,et al.  A multimodal approach for human activity recognition based on skeleton and RGB data , 2020, Pattern Recognit. Lett..

[73]  Richard P. Wildes,et al.  Spatiotemporal Multiplier Networks for Video Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Rita Noumeir,et al.  Infrared and 3D Skeleton Feature Fusion for RGB-D Action Recognition , 2020, IEEE Access.

[75]  Rainer Stiefelhagen,et al.  Drive&Act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[76]  Bolei Zhou,et al.  Temporal Relational Reasoning in Videos , 2017, ECCV.

[77]  Lei Wang,et al.  Ensemble One-Dimensional Convolution Neural Networks for Skeleton-Based Action Recognition , 2018, IEEE Signal Processing Letters.

[78]  Yimin Zhang,et al.  Human motion recognition exploiting radar with stacked recurrent neural network , 2019, Digit. Signal Process..

[79]  Daniela Micucci,et al.  UniMiB SHAR: a new dataset for human activity recognition using acceleration data from smartphones , 2016, ArXiv.

[80]  Luc Van Gool,et al.  Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification , 2017, ArXiv.

[81]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[82]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[83]  Yue Zhao,et al.  PM-GANs: Discriminative Representation Learning for Action Recognition Using Partial-modalities , 2018, ECCV.

[84]  Gaetano Borriello,et al.  A Practical Approach to Recognizing Physical Activities , 2006, Pervasive.

[85]  Wei Wang,et al.  Gait recognition using wifi signals , 2016, UbiComp.

[86]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[87]  Frederic Lerasle,et al.  Benchmark for Kitchen20, a daily life dataset for audio-based human action recognition , 2019, 2019 International Conference on Content-Based Multimedia Indexing (CBMI).

[88]  Chao Li,et al.  Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation , 2018, IJCAI.

[89]  Nizar Bouguila,et al.  Variational Learning of Beta-Liouville Hidden Markov Models for Infrared Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[90]  Mohammed Bennamoun,et al.  SkeletonNet: Mining Deep Part Features for 3-D Action Recognition , 2017, IEEE Signal Processing Letters.

[91]  Zhiwei Xiong,et al.  Two-Stream Action Recognition-Oriented Video Super-Resolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[92]  Javed Imran,et al.  Human action recognition using RGB-D sensor and deep convolutional neural networks , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[93]  Shih-Fu Chang,et al.  Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[94]  William Robson Schwartz,et al.  Skeleton Image Representation for 3D Action Recognition Based on Tree Structure and Reference Joints , 2019, 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI).

[95]  Qingquan Li,et al.  Robust Gait Recognition by Integrating Inertial and RGBD Sensors , 2016, IEEE Transactions on Cybernetics.

[96]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[97]  Ruigang Yang,et al.  Accurate 3D pose estimation from a single depth image , 2011, 2011 International Conference on Computer Vision.

[98]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[99]  Dacheng Tao,et al.  Graph Edge Convolutional Neural Networks for Skeleton-Based Action Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[100]  Yu Qiao,et al.  Action Recognition with Stacked Fisher Vectors , 2014, ECCV.

[101]  Gaurav Sharma,et al.  AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[102]  Hong Wei,et al.  A survey of human motion analysis using depth imagery , 2013, Pattern Recognit. Lett..

[103]  Xiaoyan Sun,et al.  MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[104]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[105]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[106]  Mohammed Bennamoun,et al.  Learning Action Recognition Model from Depth and Skeleton Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[107]  Guodong Guo,et al.  A survey on still image based human action recognition , 2014, Pattern Recognit..

[108]  Dietrich Paulus,et al.  Gimme Signals: Discriminative signal encoding for multimodal activity recognition , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[109]  Shahrokh Valaee,et al.  A Survey on Behavior Recognition Using WiFi Channel State Information , 2017, IEEE Communications Magazine.

[110]  Hong Liu,et al.  Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[111]  Abhinav Gupta,et al.  ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[112]  Pichao Wang,et al.  Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[113]  Arif Mahmood,et al.  HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition , 2014, ECCV.

[114]  Dima Damen,et al.  DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[115]  Yongcai Guo,et al.  Efficient Parallel Inflated 3D Convolution Architecture for Action Recognition , 2020, IEEE Access.

[116]  Luc Van Gool,et al.  Two-Stream SR-CNNs for Action Recognition in Videos , 2016, BMVC.

[117]  Lin Sun,et al.  Lattice Long Short-Term Memory for Human Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[118]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[119]  Hiroshi Murase,et al.  Action recognition from extremely low-resolution thermal image sequence , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[120]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[121]  Xiaoming Liu,et al.  On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[122]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[123]  Michael M. Bronstein,et al.  MOTIFNET: A MOTIF-BASED GRAPH CONVOLUTIONAL NETWORK FOR DIRECTED GRAPHS , 2018, 2018 IEEE Data Science Workshop (DSW).

[124]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[125]  Luc Van Gool,et al.  Deep Temporal Linear Encoding Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[126]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[127]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2016, Journal of Real-Time Image Processing.

[128]  Tieniu Tan,et al.  An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[129]  Amir Roshan Zamir,et al.  Action Recognition in Realistic Sports Videos , 2014 .

[130]  Guodong Guo,et al.  TriViews: A general framework to use 3D depth data effectively for action recognition , 2015, J. Vis. Commun. Image Represent..

[131]  Nanning Zheng,et al.  View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[132]  Arif Mahmood,et al.  Real time action recognition using histograms of depth gradients and random decision forests , 2014, IEEE Winter Conference on Applications of Computer Vision.

[133]  Youngwook Kim,et al.  Micro-Doppler Based Classification of Human Aquatic Activities via Transfer Learning of Convolutional Neural Networks , 2016, Sensors.

[134]  Richard P. Wildes,et al.  Spatiotemporal Residual Networks for Video Action Recognition , 2016, NIPS.

[135]  Tal Hassner,et al.  The Action Similarity Labeling Challenge , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[136]  Ripul Ghosh,et al.  Deep learning approach for human action recognition in infrared images , 2018, Cognitive Systems Research.

[137]  Daijin Kim,et al.  Robust human activity recognition from depth video using spatiotemporal multi-fused features , 2017, Pattern Recognit..

[138]  Zhengming Ding,et al.  Semi-Supervised Cross-Modality Action Recognition by Latent Tensor Transfer Learning , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[139]  Y.-K. Lee,et al.  Human Activity Recognition via an Accelerometer-Enabled-Smartphone Using Kernel Discriminant Analysis , 2010, 2010 5th International Conference on Future Information Technology.

[140]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[141]  Josef Kittler,et al.  Spatial Residual Layer and Dense Connection Block Enhanced Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[142]  Kaushik Mitra,et al.  Dynamic Vision Sensors for Human Activity Recognition , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[143]  Wayne Luk,et al.  F-E3D: FPGA-based Acceleration of an Efficient 3D Convolutional Neural Network for Human Action Recognition , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[144]  Lei Shi,et al.  Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[145]  Wanqing Li,et al.  Deep Independently Recurrent Neural Network (IndRNN) , 2019, ArXiv.

[146]  Hao Ling,et al.  Doppler and direction-of-arrival (DDOA) radar for multiple-mover sensing , 2007, IEEE Transactions on Aerospace and Electronic Systems.

[147]  Alexandros André Chaaraoui,et al.  Evolutionary joint selection to improve human action recognition with RGB-D devices , 2014, Expert Syst. Appl..

[148]  Satoshi Nakamura,et al.  Make Skeleton-based Action Recognition Model Smaller, Faster and Better , 2019, MMAsia.

[149]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[150]  Chen Sun,et al.  D3D: Distilled 3D Networks for Video Action Recognition , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[151]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[152]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[153]  Jianfei Yang,et al.  ARID: A New Dataset for Recognizing Action in the Dark , 2020, ArXiv.

[154]  Yueting Zhuang,et al.  Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2018, IEEE Transactions on Multimedia.

[155]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[156]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[157]  Yang Yang,et al.  A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition , 2018, ACM Multimedia.

[158]  Frans C. A. Groen,et al.  Feature-based human motion parameter estimation with radar , 2008 .

[159]  Youngwook Kim,et al.  Human Detection and Activity Classification Based on Micro-Doppler Signatures Using Deep Convolutional Neural Networks , 2016, IEEE Geoscience and Remote Sensing Letters.

[160]  Tian-Tsong Ng,et al.  Multimodal Multipart Learning for Action Recognition in Depth Videos , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[161]  Mehrtash Tafazzoli Harandi,et al.  Going deeper into action recognition: A survey , 2016, Image Vis. Comput..

[162]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[163]  Zhenbing Liu,et al.  Spatiotemporal saliency-based multi-stream networks with attention-aware LSTM for action recognition , 2020, Neural Computing and Applications.

[164]  Yuting Su,et al.  Multiple/Single-View Human Action Recognition via Part-Induced Multitask Structural Learning , 2015, IEEE Transactions on Cybernetics.

[165]  Cordelia Schmid,et al.  Speech2Action: Cross-Modal Supervision for Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[166]  Mignon Park,et al.  Human action recognition for night vision using temporal templates with infrared thermal camera , 2013, 2013 10th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI).

[167]  Yiannis Andreopoulos,et al.  PIX2NVS: Parameterized conversion of pixel-domain video frames to neuromorphic vision streams , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[168]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[169]  Guodong Guo,et al.  Fusing Spatiotemporal Features and Joints for 3D Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[170]  Hong Liu,et al.  3D Action Recognition Using Multi-Temporal Depth Motion Maps and Fisher Vector , 2016, IJCAI.

[171]  Susanne Westphal,et al.  The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[172]  Tobi Delbruck,et al.  A 240×180 10mW 12us latency sparse-output vision sensor for mobile applications , 2013, 2013 Symposium on VLSI Circuits.

[173]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[174]  Jitendra Malik,et al.  SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[175]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[176]  Tobi Delbrück,et al.  A Low Power, Fully Event-Based Gesture Recognition System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[177]  Tobi Delbrück,et al.  DHP19: Dynamic Vision Sensor 3D Human Pose Dataset , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[178]  Fei Wang,et al.  Temporal Unet: Sample Level Human Action Recognition using WiFi , 2019, ArXiv.

[179]  Wen-Nung Lie,et al.  Two-stream deep learning architecture for action recognition by using extremely low-resolution infrared thermopile arrays , 2020, Other Conferences.

[180]  Dave Tahmoush,et al.  Radar micro-doppler for long range front-view gait recognition , 2009, 2009 IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems.

[181]  Jun Wan,et al.  Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition , 2018, AAAI.

[182]  Xianglong Liu,et al.  Spatio-temporal deformable 3D ConvNets with attention for action recognition , 2020, Pattern Recognit..

[183]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[184]  Limin Wang,et al.  Appearance-and-Relation Networks for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[185]  Andrew Zisserman,et al.  Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[186]  Jie He,et al.  WiDriver: Driver Activity Recognition System Based on WiFi CSI , 2018, Int. J. Wirel. Inf. Networks.

[187]  Marwan Torki,et al.  Histogram of Oriented Displacements (HOD): Describing Trajectories of Human Joints for Action Recognition , 2013, IJCAI.

[188]  Xiaobo Lu,et al.  Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals , 2019, Applied Intelligence.

[189]  Yi Lin,et al.  Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[190]  Jie Yang,et al.  E-eyes: device-free location-oriented activity identification using fine-grained WiFi signatures , 2014, MobiCom.

[191]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[192]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[193]  Yi Zhu,et al.  Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition , 2016, ECCV Workshops.

[194]  Nader Karimi,et al.  Aggregation of Rich Depth-Aware Features in a Modified Stacked Generalization Model for Single Image Depth Estimation , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[195]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[196]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[197]  Bernard Ghanem,et al.  Self-Supervised Learning by Cross-Modal Audio-Video Clustering , 2019, NeurIPS.

[198]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[199]  Bin Sheng,et al.  Deep Convolutional Neural Networks for Human Action Recognition Using Depth Maps and Postures , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[200]  Dina Katabi,et al.  Making the Invisible Visible: Action Recognition Through Walls and Occlusions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[201]  Shuang Wang,et al.  Structured Images for RGB-D Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[202]  Zhe Wang,et al.  Towards Good Practices for Very Deep Two-Stream ConvNets , 2015, ArXiv.

[203]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[204]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[205]  Billur Barshan,et al.  Human Activity Recognition Using Inertial/Magnetic Sensor Units , 2010, HBU.

[206]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[207]  Nasrollah Moghaddam Charkari,et al.  Survey on deep learning methods in human action recognition , 2017, IET Comput. Vis..

[208]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[209]  Zenglin Xu,et al.  Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition , 2018, AAAI.

[210]  Hao Yang,et al.  Time-Asymmetric 3d Convolutional Neural Networks for Action Recognition , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[211]  Hongdong Li,et al.  Few-Shot Action Recognition with Permutation-Invariant Attention , 2020, ECCV.

[212]  Yun Fu,et al.  Human Action Recognition and Prediction: A Survey , 2018, International Journal of Computer Vision.

[213]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[214]  Mohamed Airouche,et al.  A new technique based on 3D convolutional neural networks and filtering optical flow maps for action classification in infrared video , 2019 .

[215]  Jian-Huang Lai,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[216]  Mohammed Bennamoun,et al.  A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[217]  Li Chen,et al.  Survey of pedestrian action recognition techniques for autonomous driving , 2020, Tsinghua Science and Technology.

[218]  Ying Wu,et al.  Cross-View Action Modeling, Learning, and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[219]  Lei Shi,et al.  Skeleton-Based Action Recognition With Directed Graph Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[220]  Christian Wolf,et al.  Sequential Deep Learning for Human Action Recognition , 2011, HBU.

[221]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[222]  Pichao Wang,et al.  Joint Distance Maps Based Action Recognition With Convolutional Neural Networks , 2017, IEEE Signal Processing Letters.

[223]  Nasser Kehtarnavaz,et al.  A survey of depth and inertial sensor fusion for human action recognition , 2015, Multimedia Tools and Applications.

[224]  Zhouyu Fu,et al.  Semantic-Based Surveillance Video Retrieval , 2007, IEEE Transactions on Image Processing.

[225]  Arijit Mukherjee,et al.  A Reservoir-based Convolutional Spiking Neural Network for Gesture Recognition from DVS Input , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[226]  Yao Wang,et al.  Action Recognition Based on Two-Stream Convolutional Networks With Long-Short-Term Spatiotemporal Features , 2020, IEEE Access.

[227]  Pawan Kumar Singh,et al.  Fuzzy Integral-Based CNN Classifier Fusion for 3D Skeleton Action Recognition , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[228]  Balasubramanian Raman,et al.  Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition , 2020, J. Ambient Intell. Humaniz. Comput..

[229]  Nicu Sebe,et al.  Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[230]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[231]  Dima Damen,et al.  EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[232]  Dacheng Tao,et al.  Context Aware Graph Convolution for Skeleton-Based Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[233]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[234]  Xuelong Li,et al.  A 3D-CNN and LSTM Based Multi-Task Learning Architecture for Action Recognition , 2019, IEEE Access.

[235]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[236]  S. Z. Gürbüz,et al.  Deep convolutional autoencoder for radar-based classification of similar aided and unaided human activities , 2018, IEEE Transactions on Aerospace and Electronic Systems.

[237]  Wenhao Yu,et al.  An attention mechanism based convolutional LSTM network for video action recognition , 2019, Multimedia Tools and Applications.

[238]  Tae-Kyun Kim,et al.  Learning and Refining of Privileged Information-Based RNNs for Action Recognition from Depth Sequences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[239]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[240]  Bowen Du,et al.  EV-Gait: Event-Based Robust Gait Recognition Using Dynamic Vision Sensors , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[241]  Angelo M. Sabatini,et al.  Machine Learning Methods for Classifying Human Physical Activity from On-Body Accelerometers , 2010, Sensors.

[242]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[243]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[244]  Gioia Ballin,et al.  3D Flow Estimation for Human Action Recognition from Colored Point Clouds , 2013, BICA 2013.

[245]  Guo-Jun Qi,et al.  Differential Recurrent Neural Networks for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[246]  Qing Zhang,et al.  A Survey on Human Motion Analysis from Depth Data , 2013, Time-of-Flight and Depth Imaging.

[247]  Xu Chen,et al.  Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[248]  Jing Lv,et al.  InfAR dataset: Infrared action recognition at different times , 2016, Neurocomputing.

[249]  Kaishun Wu,et al.  WiFall: Device-free fall detection by wireless networks , 2017, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[250]  Andrey Ignatov,et al.  Real-time human activity recognition from accelerometer data using Convolutional Neural Networks , 2018, Appl. Soft Comput..

[251]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[252]  Nasser Kehtarnavaz,et al.  Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors , 2015, IEEE Transactions on Human-Machine Systems.

[253]  Yang Xiao,et al.  Action Recognition for Depth Video using Multi-view Dynamic Images , 2018, Inf. Sci..

[254]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[255]  Gang Wang,et al.  Multi-modal feature fusion for action recognition in RGB-D sequences , 2014, 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[256]  Mohammad H. Mahoor,et al.  Human activity recognition using multi-features and multiple kernel learning , 2014, Pattern Recognit..

[257]  Mahbub Hassan,et al.  KEH-Gait: Towards a Mobile Healthcare User Authentication System by Kinetic Energy Harvesting , 2017, NDSS.

[258]  Mohammed Bennamoun,et al.  Learning Clip Representations for Skeleton-Based 3D Action Recognition , 2018, IEEE Transactions on Image Processing.

[259]  Jefersson Alex dos Santos,et al.  SkeleMotion: A New Representation of Skeleton Joint Sequences based on Motion Information for 3D Action Recognition , 2019, 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[260]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[261]  Wenjun Zeng,et al.  Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[262]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[263]  Yann LeCun,et al.  A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[264]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[265]  Sanghoon Lee,et al.  Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[266]  Thomas Brox,et al.  ECO: Efficient Convolutional Network for Online Video Understanding , 2018, ECCV.

[267]  Mahbub Hassan,et al.  Simultaneous Energy Harvesting and Gait Recognition Using Piezoelectric Energy Harvester , 2020, IEEE Transactions on Mobile Computing.

[268]  S. Gong,et al.  Recognising action as clouds of space-time interest points , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[269]  Ruslan Salakhutdinov,et al.  Action Recognition using Visual Attention , 2015, NIPS 2015.

[270]  Jing Zhang,et al.  RGB-D-based action recognition datasets: A survey , 2016, Pattern Recognit..

[271]  Jing Zhang,et al.  Action Recognition From Depth Maps Using Deep Convolutional Neural Networks , 2016, IEEE Transactions on Human-Machine Systems.

[272]  Dong Ming,et al.  Infrared gait recognition based on wavelet transform and support vector machine , 2010, Pattern Recognit..

[273]  Soharab Hossain Shaikh,et al.  A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector , 2015, The Visual Computer.

[274]  Hang Zhao,et al.  HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization , 2017, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[275]  Patrick van der Smagt,et al.  Two-stream RNN/CNN for action recognition in 3D videos , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[276]  Jun Kong,et al.  Collaborative multimodal feature learning for RGB-D action recognition , 2019, J. Vis. Commun. Image Represent..

[277]  Somayeh Danafar,et al.  Action Recognition for Surveillance Applications Using Optic Flow and SVM , 2007, ACCV.

[278]  Marco La Cascia,et al.  3D skeleton-based human action classification: A survey , 2016, Pattern Recognit..

[279]  Vittorio Murino,et al.  Audio-Visual Model Distillation Using Acoustic Images , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[280]  Hong Liu,et al.  A Survey on 3D Skeleton-Based Action Recognition Using Learning Method , 2020, Cyborg and bionic systems.

[281]  John R. Hershey,et al.  Attention-Based Multimodal Fusion for Video Description , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[282]  Balasubramanian Raman,et al.  Deep residual infrared action recognition by integrating local and global spatio-temporal cues , 2019, Infrared Physics & Technology.

[283]  Gang Wang,et al.  SSNet: Scale Selection Network for Online 3D Action Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[284]  Thomas Young,et al.  II. The Bakerian Lecture. On the theory of light and colours , 1802, Philosophical Transactions of the Royal Society of London.

[285]  Jing Li,et al.  Global Temporal Representation Based CNNs for Infrared Action Recognition , 2018, IEEE Signal Processing Letters.

[286]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[287]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[288]  Minglin Chen,et al.  3D Behavior Recognition Based on Multi-Modal Deep Space-Time Learning , 2019 .

[289]  Cordelia Schmid,et al.  Discriminative spatial saliency for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[290]  Yi Zhu,et al.  Hidden Two-Stream Convolutional Networks for Action Recognition , 2017, ACCV.

[291]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[292]  Tinne Tuytelaars,et al.  Rank Pooling for Action Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[293]  Dima Damen,et al.  Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.

[294]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[295]  Cordelia Schmid,et al.  A Robust and Efficient Video Representation for Action Recognition , 2015, International Journal of Computer Vision.

[296]  Song-Chun Zhu,et al.  Joint action recognition and pose estimation from video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[297]  Juergen Gall,et al.  Cross-Modal Knowledge Distillation for Action Recognition , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[298]  Bo Yu,et al.  Convolutional Neural Networks for human activity recognition using mobile sensors , 2014, 6th International Conference on Mobile Computing, Applications and Services.

[299]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[300]  Roberto Cipolla,et al.  Extracting Spatiotemporal Interest Points using Global Information , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[301]  Bir Bhanu,et al.  Human Activity Recognition in Thermal Infrared Imagery , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[302]  Jaeyoung Yang,et al.  Activity Recognition Based on RFID Object Usage for Smart Mobile Devices , 2011, Journal of Computer Science and Technology.

[303]  Alexandros André Chaaraoui,et al.  Fusion of Skeletal and Silhouette-Based Features for Human Action Recognition with RGB-D Devices , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[304]  Bart Selman,et al.  Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[305]  Francesco Fioranelli,et al.  Action Recognition Using Indoor Radar Systems , 2019 .

[306]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[307]  Mohammed Bennamoun,et al.  Global Regularizer and Temporal-Aware Cross-Entropy for Skeleton-Based Early Action Recognition , 2018, ACCV.

[308]  Ripul Ghosh,et al.  A spatio-temporal deep learning approach for human action recognition in infrared videos , 2018, Optical Engineering + Applications.

[309]  Fei-Fei Li,et al.  Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[310]  Lin Sun,et al.  Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[311]  Fu Xiong,et al.  3DV: 3D Dynamic Voxel for Action Recognition in Depth Video , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[312]  Arif Mahmood,et al.  Histogram of Oriented Principal Components for Cross-View Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[313]  Austin Reiter,et al.  Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[314]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[315]  Yong Du,et al.  Skeleton based action recognition with convolutional neural network , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[316]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[317]  Tido Röder,et al.  Documentation Mocap Database HDM05 , 2007 .

[318]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[319]  Nitish V. Thakor,et al.  Spatiotemporal Filtering for Event-Based Action Recognition , 2019, ArXiv.

[320]  Zhaozheng Yin,et al.  Human Activity Recognition Using Wearable Sensors by Deep Convolutional Neural Networks , 2015, ACM Multimedia.

[321]  Yu Qiao,et al.  RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[322]  Yiran Chen,et al.  D3-LND: A two-stream framework with discriminant deep descriptor, linear CMDT and nonlinear KCMDT descriptors for action recognition , 2019, Neurocomputing.

[323]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[324]  Pichao Wang,et al.  A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities , 2020, Sensors.

[325]  Gang Wang,et al.  NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[326]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[327]  Cordelia Schmid,et al.  Activity representation with motion hierarchies , 2013, International Journal of Computer Vision.

[328]  Fei Wu,et al.  Spatio-Temporal Graph Routing for Skeleton-Based Action Recognition , 2019, AAAI.

[329]  Junsong Yuan,et al.  Space-Time Event Clouds for Gesture Recognition: From RGB Cameras to Event Cameras , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[330]  Fei Han,et al.  Space-Time Representation of People Based on 3D Skeletal Data: A Review , 2016, Comput. Vis. Image Underst..

[331]  Zhuolin Jiang,et al.  Learning Spatiotemporal Features for Infrared Action Recognition with 3D Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[332]  Ling Shao,et al.  From handcrafted to learned representations for human action recognition: A survey , 2016, Image Vis. Comput..

[333]  Gang Wang,et al.  Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[334]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[335]  Lin Gao,et al.  Graph CNNs with Motif and Variable Temporal Block for Skeleton-Based Action Recognition , 2019, AAAI.

[336]  Pichao Wang,et al.  Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[337]  Bogdan Kwolek,et al.  Improving multimodal action representation with joint motion history context , 2019, J. Vis. Commun. Image Represent..

[338]  Kaishun Wu,et al.  We Can Hear You with Wi-Fi! , 2014, IEEE Transactions on Mobile Computing.

[339]  Thomas Brox,et al.  Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[340]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[341]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[342]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[343]  Hongsong Wang,et al.  Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[344]  Arnold W. M. Smeulders,et al.  Timeception for Complex Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[345]  Mohan M. Trivedi,et al.  Joint Angles Similarities and HOG2 for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[346]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[347]  Paul J. M. Havinga,et al.  Activity Recognition Using Inertial Sensing for Healthcare, Wellbeing and Sports Applications: A Survey , 2010, ARCS Workshops.

[348]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[349]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[350]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[351]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[352]  Chaoxing Huang Event-based Action Recognition Using Timestamp Image Encoding Network , 2020, ArXiv.

[353]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[354]  Marta Marrón Romera,et al.  3DFCNN: real-time action recognition using 3D deep neural networks with raw depth information , 2020, Multimedia Tools and Applications.

[355]  Wei Wang,et al.  Device-Free Human Activity Recognition Using Commercial WiFi Devices , 2017, IEEE Journal on Selected Areas in Communications.

[356]  Imen Jegham,et al.  Vision-based human action recognition: An overview and real world challenges , 2020, Digit. Investig..

[357]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[358]  Yifan Zhang,et al.  Skeleton-Based Action Recognition With Shift Graph Convolutional Network , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[359]  Martin Masek,et al.  Joint movement similarities for robust 3D action recognition using skeletal data , 2015, J. Vis. Commun. Image Represent..

[360]  Cewu Lu,et al.  Range-Sample Depth Feature for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[361]  Antonio Fernández-Caballero,et al.  A survey of video datasets for human action and activity recognition , 2013, Comput. Vis. Image Underst..

[362]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[363]  Costas J. Spanos,et al.  WiFi and Vision Multimodal Learning for Accurate and Robust Device-Free Human Activity Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[364]  Gary M. Weiss,et al.  Activity recognition using cell phone accelerometers , 2011, SKDD.

[365]  Yun Fu,et al.  Bilinear heterogeneous information machine for RGB-D action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[366]  Duc A. Tran,et al.  The 11th International Conference on Mobile Systems and Pervasive Computing (MobiSPC-2014) A Study on Human Activity Recognition Using Accelerometer Data from Smartphones , 2014 .

[367]  Du Tran,et al.  What Makes Training Multi-Modal Classification Networks Hard? , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[368]  Sridha Sridharan,et al.  Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[369]  Alex ChiChung Kot,et al.  Collaborative Learning of Gesture Recognition and 3D Hand Pose Estimation with Multi-order Feature Analysis , 2020, ECCV.

[370]  Pichao Wang,et al.  Depth Pooling Based Large-Scale 3-D Action Recognition With Convolutional Neural Networks , 2018, IEEE Transactions on Multimedia.

[371]  Cordelia Schmid,et al.  AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[372]  Soon Myoung Chung,et al.  Orthogonal moment-based descriptors for pose shape query on 3D point cloud patches , 2016, Pattern Recognit..

[373]  Xinyu Li,et al.  A Survey of Deep Learning-Based Human Activity Recognition in Radar , 2019, Remote. Sens..

[374]  Qinghua Huang,et al.  Learning Shape-Motion Representations from Geometric Algebra Spatio-Temporal Model for Skeleton-Based Action Recognition , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[375]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[376]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[377]  Sung Wook Baik,et al.  Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features , 2018, IEEE Access.

[378]  Wenbing Zhao,et al.  A Survey of Applications and Human Motion Recognition with Microsoft Kinect , 2015, Int. J. Pattern Recognit. Artif. Intell..

[379]  Wenjun Zeng,et al.  Multi-Modality Multi-Task Recurrent Neural Network for Online Action Detection , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[380]  Radha Poovendran,et al.  Human activity recognition for video surveillance , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[381]  Lei Wang,et al.  A Comparative Review of Recent Kinect-Based Action Recognition Algorithms , 2019, IEEE Transactions on Image Processing.

[382]  Andrew Zisserman,et al.  A Short Note about Kinetics-600 , 2018, ArXiv.

[383]  Andrew Zisserman,et al.  Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[384]  Alberto Del Bimbo,et al.  Temporal Binary Representation for Event-Based Action Recognition , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[385]  Gang Wang,et al.  Global Context-Aware Attention LSTM Networks for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[386]  Sergio Escalera,et al.  RGB-D-based Human Motion Recognition with Deep Learning: A Survey , 2017, Comput. Vis. Image Underst..

[387]  Min-Chun Hu,et al.  Human action recognition and retrieval using sole depth information , 2012, ACM Multimedia.

[388]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[389]  Yanfeng Wang,et al.  Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[390]  Qian Wang,et al.  Deep Learning-Based Gait Recognition Using Smartphones in the Wild , 2018, IEEE Transactions on Information Forensics and Security.

[391]  Jian Liu,et al.  Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition , 2017, CVPR Workshops.

[392]  Qi Tian,et al.  Human Daily Action Analysis with Multi-view and Color-Depth Data , 2012, ECCV Workshops.

[393]  Jiaying Liu,et al.  Modality Compensation Network: Cross-Modal Adaptation for Action Recognition , 2020, IEEE Transactions on Image Processing.

[394]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[395]  Petros Maragos,et al.  Multimodal human action recognition in assistive human-robot interaction , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[396]  Hichem Snoussi,et al.  Exploring a rich spatial-temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN , 2020, Neurocomputing.

[397]  Hefei Ling,et al.  XwiseNet: action recognition with Xwise separable convolutions , 2020, Multimedia Tools and Applications.

[398]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[399]  Xiaoshuai Sun,et al.  Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length , 2018, IEEE Transactions on Multimedia.

[400]  Aun Irtaza,et al.  Robust Human Activity Recognition Using Multimodal Feature-Level Fusion , 2019, IEEE Access.

[401]  Wei Wang,et al.  Understanding and Modeling of WiFi Signal Based Human Activity Recognition , 2015, MobiCom.

[402]  Christoph Meinel,et al.  Exploring multimodal video representation for action recognition , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[403]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[404]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[405]  Andrew Zisserman,et al.  A Short Note on the Kinetics-700 Human Action Dataset , 2019, ArXiv.

[406]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[407]  Gang Wang,et al.  Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[408]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[409]  Kien A. Hua,et al.  Temporal Order-Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor Streams , 2015, ICMR.

[410]  Luc Van Gool,et al.  Spatio-Temporal Channel Correlation Networks for Action Classification , 2018, ECCV.

[411]  Bolei Zhou,et al.  Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[412]  André Bourdoux,et al.  Indoor Person Identification Using a Low-Power FMCW Radar , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[413]  Ling-Yu Duan,et al.  HARD-Net: Hardness-AwaRe Discrimination Network for 3D Early Activity Prediction , 2020, European Conference on Computer Vision.

[414]  Andrew Owens,et al.  Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.

[415]  Deva Ramanan,et al.  Attentional Pooling for Action Recognition , 2017, NIPS.

[416]  Daniel P. Siewiorek,et al.  Activity recognition and monitoring using multiple sensors on different body positions , 2006, International Workshop on Wearable and Implantable Body Sensor Networks (BSN'06).

[417]  Tao Mei,et al.  Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[418]  Nico Blodow,et al.  Action recognition in intelligent environments using point cloud features extracted from silhouette sequences , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[419]  Chen Chen,et al.  Memory Attention Networks for Skeleton-Based Action Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[420]  Youngwook Kim,et al.  Human Activity Classification Based on Micro-Doppler Signatures Using a Support Vector Machine , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[421]  Tieniu Tan,et al.  Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning , 2018, ECCV.

[422]  Bowen Zhang,et al.  Real-Time Action Recognition with Enhanced Motion Vector CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[423]  T. Delbruck,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < 1 , 2022 .

[424]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[425]  Li Ma,et al.  Coupled hidden conditional random fields for RGB-D human action recognition , 2015, Signal Process..

[426]  A.G. Stove,et al.  Modern FMCW radar - techniques and applications , 2004, First European Radar Conference, 2004. EURAD..

[427]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[428]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[429]  Ahmet Burak Can,et al.  Recognition of Basic Human Actions using Depth Information , 2014, Int. J. Pattern Recognit. Artif. Intell..

[430]  Arif Mahmood,et al.  Action Classification with Locality-Constrained Linear Coding , 2014, 2014 22nd International Conference on Pattern Recognition.

[431]  Nasser Kehtarnavaz,et al.  A Real-Time Human Action Recognition System Using Depth and Inertial Sensor Fusion , 2016, IEEE Sensors Journal.

[432]  Lu Yang,et al.  Combing RGB and Depth Map Features for human activity recognition , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[433]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.