Dual-codebook learning and hierarchical transfer for cross-view action recognition

Abstract. We focus on the challenging cross-view action recognition problem. The key to this problem is to find the correspondence between two different views, which is realized in two stages. First, we construct a dual-codebook for the two views, which contains one codebook for each view. Each codeword in one codebook has a corresponding codeword in the other codebook, whereas traditional methods implement independent codebooks for the views. We propose an effective coclustering algorithm based on seminonnegative matrix factorization to derive the dual-codebook. Additionally, to represent actions in one view, unlike most other methods using the codebook of that view only, we also exploit the codebook-specific information from the other view. Thus, we construct mapped-codebooks via codebook transformation, complementing the codebook-to-codebook correspondence within the dual-codebook. In the second stage, observing that the temporal relationship between action segments within an action is view invariant, we further propose a hierarchical transfer framework based on a temporal structure that can effectively capture such action-segment temporal relationship at multiple timescales, which is more discriminative than the usual video-level transfer strategy. Extensive experimental results on the INRIA xmas motion acquisition sequences and West Virginia University datasets demonstrate superiority of the proposed method compared with state-of-the-art approaches.

[1]  Jing Wang,et al.  Cross-View Action Recognition Based on a Statistical Translation Framework , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Mohan M. Trivedi,et al.  Human action recognition using multiple views: a comparative perspective on recent developments , 2011, J-HGBU '11.

[3]  Rama Chellappa,et al.  Cross-View Action Recognition via Transferable Dictionary Learning , 2016, IEEE Transactions on Image Processing.

[4]  Rama Chellappa,et al.  Cross-View Action Recognition via a Transferable Dictionary Pair , 2012, BMVC.

[5]  Chunheng Wang,et al.  Cross-View Action Recognition Using Contextual Maximum Margin Clustering , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Behrooz Mahasseni,et al.  Latent Multitask Learning for View-Invariant Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Wang Chuanx On Temporal Order Invariance for View-invariant Action Recognition , 2015 .

[12]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[14]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Xinghao Jiang,et al.  Two-Stream Dictionary Learning Architecture for Action Recognition , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Limin Wang,et al.  Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice , 2014, Comput. Vis. Image Underst..

[17]  Takeo Kanade,et al.  Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[18]  Dimitrios Makris,et al.  Hierarchical transfer learning for online recognition of compound actions , 2016, Comput. Vis. Image Underst..

[19]  Supavadee Aramvith,et al.  Human action classification using adaptive key frame interval for feature extraction , 2016, J. Electronic Imaging.

[20]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[21]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[22]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Yunde Jia,et al.  Cross-View Action Recognition over Heterogeneous Feature Spaces , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[25]  Sergio A. Velastin,et al.  Intelligent distributed surveillance systems: a review , 2005 .

[26]  Jianxin Wu,et al.  Good Practices for Learning to Recognize Actions Using FV and VLAD , 2016, IEEE Transactions on Cybernetics.

[27]  Chong-Wah Ngo,et al.  Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling , 2015, IEEE Transactions on Image Processing.

[28]  Te-Feng Su,et al.  A Multiattribute Sparse Coding Approach for Action Recognition From a Single Unknown Viewpoint , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Jingjing Zheng,et al.  Learning View-Invariant Sparse Representations for Cross-View Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Lan Wang,et al.  Human action recognition based on point context tensor shape descriptor , 2017, J. Electronic Imaging.

[31]  Ying Wu,et al.  Cross-View Action Modeling, Learning, and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[33]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  Ruonan Li,et al.  Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Mubarak Shah,et al.  Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Chunheng Wang,et al.  Cross-View Action Recognition via a Continuous Virtual Path , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[40]  Arif Mahmood,et al.  Histogram of Oriented Principal Components for Cross-View Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[42]  Huicheng Zheng,et al.  Cross-View Action Recognition Based on Hierarchical View-Shared Dictionary Learning , 2018, IEEE Access.

[43]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[44]  Ajmal S. Mian,et al.  Learning a non-linear knowledge transfer model for cross-view action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[47]  Rainer Stiefelhagen,et al.  What to Transfer? High-Level Semantics in Transfer Metric Learning for Action Similarity , 2014, 2014 22nd International Conference on Pattern Recognition.

[48]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[49]  Jun Li,et al.  Deeply Learned View-Invariant Features for Cross-View Action Recognition , 2017, IEEE Transactions on Image Processing.

[50]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[51]  Yunhong Wang,et al.  Face synthesis from near-infrared to visual light via sparse representation , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[52]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[53]  Andrew Zisserman,et al.  Computer Vision – ECCV 2008 , 2008, Lecture Notes in Computer Science.

[54]  Ali Farhadi,et al.  A latent model of discriminative aspect , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[55]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[56]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[58]  Binlong Li,et al.  Cross-view activity recognition using Hankelets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Helio Pedrini,et al.  Real-time action recognition using a multilayer descriptor with variable size , 2016, J. Electronic Imaging.

[60]  Anuj Srivastava,et al.  Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Honghai Liu,et al.  Advances in View-Invariant Human Motion Analysis: A Review , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[62]  Ajmal Mian,et al.  Learning a Deep Model for Human Action Recognition from Novel Viewpoints , 2016 .

[63]  Bingbing Ni,et al.  Motion Part Regularization: Improving action recognition via trajectory group selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[65]  Vinodkrishnan Kulathumani,et al.  Real-time multi-view human action recognition using a wireless camera network , 2011, 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras.

[66]  James J. Little,et al.  3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Wei-Shi Zheng,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Hong-Yuan Mark Liao,et al.  Robust Action Recognition via Borrowing Information Across Video Modalities , 2015, IEEE Transactions on Image Processing.

[69]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[70]  Ling Shao,et al.  Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach , 2016, IEEE Transactions on Cybernetics.

[71]  Jeffrey Mark Siskind,et al.  Action Recognition by Time Series of Retinotopic Appearance and Motion Features , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[72]  Jian-Huang Lai,et al.  Cross-view Action Recognition via Dual-Codebook and Hierarchical Transfer Framework , 2014, ACCV.

[73]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.