Human behaviour recognition with mid-level representations for crowd understanding and analysis

Funding information National Natural Science Foundation of China, Grant/Award Number: 62076199; China Postdoctoral Science Foundation, Grant/Award Number: 2019M653784; National Key R∖&D Program of China, Grant/Award Number: 2017YFB0502900; CAS Light of West China Program, Grant/Award Number: XAB2017B15; Key Laboratory of Spectral Imaging Technology of Chinese Academy of Sciences, Grant/Award Number: LSIT201801D Abstract Crowd understanding and analysis have received increasing attention for couples of decades, and development of human behaviour recognition strongly supports the application of crowd understanding and analysis. Human behaviour recognition usually seeks to automatically analyse ongoing movements and actions in different camera views by using various machine learning methodologies in unknown video clips or image sequences. Compared to other data modalities such as documents and images, processing video data demands much higher computational and storage resources. The idea of using middle level semantic concepts to represent human actions from videos is explored and it is argued that these semantic attributes enable the construction of more descriptive methods for human action recognition. The mid-level attributes, initialized by a cluster processing, are built upon low level features and fully utilize the discrepancies in different action classes, which can capture the importance of each attribute for each action class. In this way, the representation is constructed to be semantically rich and capable of highly discriminative performance even paired with simple linear classifiers. The method is verified on three challenging datasets (KTH, UCF50 and HMDB51), and the experimental results demonstrate that our method achieves better results than the baseline methods on human action recognition.

[1]  Qi Wang,et al.  Density-Aware Curriculum Learning for Crowd Counting , 2020, IEEE Transactions on Cybernetics.

[2]  Yuan Yuan,et al.  Pixel-Wise Crowd Understanding via Synthetic Data , 2020, International Journal of Computer Vision.

[3]  Qi Wang,et al.  NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Chenquan Gan,et al.  Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions , 2020, Neurocomputing.

[5]  Cristóbal Curio,et al.  Enhancing Data-Driven Algorithms for Human Pose Estimation and Action Recognition Through Simulation , 2020, IEEE Transactions on Intelligent Transportation Systems.

[6]  Changxin Gao,et al.  Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians , 2020, ECCV.

[7]  Reza Safabakhsh,et al.  Correlational Convolutional LSTM for human action recognition , 2020, Neurocomputing.

[8]  Jiebo Luo,et al.  Jointly Learning Commonality and Specificity Dictionaries for Person Re-Identification , 2020, IEEE Transactions on Image Processing.

[9]  Zhiyong Wang,et al.  Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yixuan Li,et al.  Actions as Moving Points , 2020, ECCV.

[11]  LiXuelong,et al.  Unsupervised Learning of Human Action Categories in Still Images with Deep Representations , 2020 .

[12]  Nanning Zheng,et al.  Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Angélica Muñoz-Meléndez,et al.  Human action recognition based on low- and high-level data from wearable inertial sensors , 2019, Int. J. Distributed Sens. Networks.

[14]  Shengyong Chen,et al.  A Hierarchical Model for Human Action Recognition From Body-Parts , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Saleh Aly,et al.  Human action recognition using bag of global and local Zernike moment features , 2019, Multimedia Tools and Applications.

[16]  Bin Sheng,et al.  Deep Convolutional Neural Networks for Human Action Recognition Using Depth Maps and Postures , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[17]  Rui Zhao,et al.  Bayesian Hierarchical Dynamic Model for Human Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Haifeng Hu,et al.  Domain learning joint with semantic adaptation for human action recognition , 2019, Pattern Recognit..

[19]  Hong Liu,et al.  Sample Fusion Network: An End-to-End Data Augmentation Network for Skeleton-Based Human Action Recognition , 2019, IEEE Transactions on Image Processing.

[20]  Lorenzo Torresani,et al.  SCSampler: Sampling Salient Clips From Video for Efficient Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Tieniu Tan,et al.  An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dehui Kong,et al.  Effective human action recognition using global and local offsets of skeleton joints , 2018, Multimedia Tools and Applications.

[23]  Nanning Zheng,et al.  View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Ioannis Pratikakis,et al.  Unsupervised human action retrieval using salient points in 3D mesh sequences , 2018, Multimedia Tools and Applications.

[25]  Xiangyang Wang,et al.  GA-STIP: Action Recognition in Multi-Channel Videos With Geometric Algebra Based Spatio-Temporal Interest Points , 2018, IEEE Access.

[26]  Houqiang Li,et al.  Visual Attribute-augmented Three-dimensional Convolutional Neural Network for Enhanced Human Action Recognition , 2018, ArXiv.

[27]  Michael J. Black,et al.  On the Integration of Optical Flow and Action Recognition , 2017, GCPR.

[28]  Gang Wang,et al.  Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[29]  Sanjay Garg,et al.  Human action recognition using fusion of features for unconstrained video sequences , 2016, Comput. Electr. Eng..

[30]  Yang Yi,et al.  Human action recognition with salient trajectories and multiple kernel learning , 2017, Multimedia Tools and Applications.

[31]  Tong Wu,et al.  Human Action Attribute Learning From Video Data Using Low-Rank Representations , 2016, ArXiv.

[32]  Tong Wu,et al.  Clustering-aware structure-constrained low-rank representation model for learning human action attributes , 2016, 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP).

[33]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Chunheng Wang,et al.  Robust relative attributes for human action recognition , 2013, Pattern Analysis and Applications.

[35]  Limin Wang,et al.  Motionlets: Mid-level 3D Parts for Human Motion Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Mubarak Shah,et al.  Classifying web videos using a global video descriptor , 2013, Machine Vision and Applications.

[37]  Q. M. Jonathan Wu,et al.  Incremental Learning in Human Action Recognition Based on Snippets , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Chong-Wah Ngo,et al.  Trajectory-Based Modeling of Human Actions with Motion Reference Points , 2012, ECCV.

[39]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Dacheng Tao,et al.  Slow Feature Analysis for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[42]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[43]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[44]  Richard P. Wildes,et al.  Efficient action spotting based on a spacetime oriented structure representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[46]  Andrew Gilbert,et al.  Fast realistic multi-action recognition using mined dense spatio-temporal features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[47]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[48]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[49]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[51]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Juan Carlos Niebles,et al.  Spatial-Temporal correlatons for unsupervised action classification , 2008, 2008 IEEE Workshop on Motion and video Computing.

[53]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[54]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[55]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..