Cross-View Action Recognition Based on Hierarchical View-Shared Dictionary Learning

Recognizing human actions across different views is challenging, since observations of the same action often vary greatly with viewpoints. To solve this problem, most existing methods explore the cross-view feature transfer relationship at video level only, ignoring the sequential composition of action segments therein. In this paper, we propose a novel hierarchical transfer framework, which is based on an action temporal-structure model that contains sequential relationship between action segments at multiple timescales. Thus, it can capture the view invariance of the sequential relationship of segment-level transfer. Additionally, we observe that the original feature distributions under different views differ greatly, leading to view-dependent representations irrelevant to the intrinsic structure of actions. Thus, at each level of the proposed framework, we transform the original feature spaces of different views to a view-shared low-dimensional feature space, and jointly learn a dictionary in this space for these views. This view-shared dictionary captures the common structure of action data across the views and can represent the action segments in a way robust to view changes. Moreover, the proposed method can be kernelized easily, and operate in both unsupervised and supervised cross-view scenarios. Extensive experimental results on the IXMAS and WVU datasets demonstrate superiority of the proposed method over state-of-the-art methods.

[1]  Anuj Srivastava,et al.  Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[3]  Vinodkrishnan Kulathumani,et al.  Real-time multi-view human action recognition using a wireless camera network , 2011, 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras.

[4]  Sergio A. Velastin,et al.  Intelligent distributed surveillance systems: a review , 2005 .

[5]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[7]  Ajmal S. Mian,et al.  Learning a non-linear knowledge transfer model for cross-view action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[9]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Takeo Kanade,et al.  Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  Ruonan Li,et al.  Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Te-Feng Su,et al.  A Multiattribute Sparse Coding Approach for Action Recognition From a Single Unknown Viewpoint , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Jingjing Zheng,et al.  Learning View-Invariant Sparse Representations for Cross-View Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[16]  Honghai Liu,et al.  Advances in View-Invariant Human Motion Analysis: A Review , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[17]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Hong-Yuan Mark Liao,et al.  Robust Action Recognition via Borrowing Information Across Video Modalities , 2015, IEEE Transactions on Image Processing.

[19]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[20]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[21]  Jing Wang,et al.  Cross-View Action Recognition Based on a Statistical Translation Framework , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Rama Chellappa,et al.  Cross-View Action Recognition via a Transferable Dictionary Pair , 2012, BMVC.

[23]  Binlong Li,et al.  Cross-view activity recognition using Hankelets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Young-Koo Lee,et al.  Human Action Recognition Using Adaptive Local Motion Descriptor in Spark , 2017, IEEE Access.

[25]  Behrooz Mahasseni,et al.  Latent Multitask Learning for View-Invariant Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[31]  Chong-Wah Ngo,et al.  Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling , 2015, IEEE Transactions on Image Processing.

[32]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[33]  Yang Koo Lee,et al.  Implementation of a Virtual Training Simulator Based on 360° Multi-View Human Action Recognition , 2017, IEEE Access.

[34]  Ying Wu,et al.  Cross-View Action Modeling, Learning, and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[36]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[37]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  Chunheng Wang,et al.  Cross-View Action Recognition via a Continuous Virtual Path , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[40]  Arif Mahmood,et al.  Histogram of Oriented Principal Components for Cross-View Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jeffrey Mark Siskind,et al.  Action Recognition by Time Series of Retinotopic Appearance and Motion Features , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Jian-Huang Lai,et al.  Cross-view Action Recognition via Dual-Codebook and Hierarchical Transfer Framework , 2014, ACCV.

[43]  Nasser Kehtarnavaz,et al.  Multi-Temporal Depth Motion Maps-Based Local Binary Patterns for 3-D Human Action Recognition , 2017, IEEE Access.

[44]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[45]  Wang Chuanx On Temporal Order Invariance for View-invariant Action Recognition , 2015 .

[46]  Xinghao Jiang,et al.  Two-Stream Dictionary Learning Architecture for Action Recognition , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[47]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[48]  Yunde Jia,et al.  Cross-View Action Recognition over Heterogeneous Feature Spaces , 2013, 2013 IEEE International Conference on Computer Vision.

[49]  Mubarak Shah,et al.  Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Mohan M. Trivedi,et al.  Human action recognition using multiple views: a comparative perspective on recent developments , 2011, J-HGBU '11.

[51]  Rama Chellappa,et al.  Cross-View Action Recognition via Transferable Dictionary Learning , 2016, IEEE Transactions on Image Processing.

[52]  Chunheng Wang,et al.  Cross-View Action Recognition Using Contextual Maximum Margin Clustering , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[53]  Ali Farhadi,et al.  A latent model of discriminative aspect , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[54]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[55]  Rama Chellappa,et al.  Generalized Domain-Adaptive Dictionaries , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  James J. Little,et al.  3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Bingbing Ni,et al.  Motion Part Regularization: Improving action recognition via trajectory group selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[59]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.