论文信息 - Action Recognition From Arbitrary Views Using Transferable Dictionary Learning

Action Recognition From Arbitrary Views Using Transferable Dictionary Learning

Human action recognition is crucial to many practical applications, ranging from human-computer interaction to video surveillance. Most approaches either recognize the human action from a fixed view or require the knowledge of view angle, which is usually not available in practical applications. In this paper, we propose a novel end-to-end framework to jointly learn a view-invariance transfer dictionary and a view-invariant classifier. The result of the process is a dictionary that can project real-world 2D video into a view-invariant sparse representation, and a classifier to recognize actions with an arbitrary view. The main feature of our algorithm is the use of synthetic data to extract view-invariance between 3D and 2D videos during the pre-training phase. This guarantees the availability of training data, and removes the hassle of obtaining real-world videos in specific viewing angles. Additionally, for better describing the actions in 3D videos, we introduce a new feature set called the 3D dense trajectories to effectively encode extracted trajectory information on 3D videos. Experimental results on the IXMAS, N-UCLA, i3DPost and UWA3DII data sets show improvements over existing algorithms.

[1] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[2] M. Elad,et al. $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[3] Adrian Hilton,et al. Shape-Colour Histograms for matching 3D video sequences , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[4] Mubarak Shah,et al. Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[6] Jitendra Malik,et al. Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Yang Wang,et al. Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[9] Chris Hecker,et al. Real-time motion retargeting to highly varied user-created morphologies , 2008, SIGGRAPH 2008.

[10] Yu-Chiang Frank Wang,et al. Recognizing Actions across Cameras by Exploring the Correlated Subspace , 2012, ECCV Workshops.

[11] Luc Van Gool,et al. An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[12] Rémi Ronfard,et al. Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13] Guillermo Sapiro,et al. Online dictionary learning for sparse coding , 2009, ICML '09.

[14] Yang Yang,et al. Learning semantic visual vocabularies using diffusion distance , 2009, CVPR.

[15] Larry S. Davis,et al. Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[16] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17] Ronen Basri,et al. Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Ling Shao,et al. Action Recognition Using 3D Histograms of Texture and A Multi-Class Boosting Classifier , 2017, IEEE Transactions on Image Processing.

[19] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Binlong Li,et al. Cross-view activity recognition using Hankelets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Ying Wu,et al. Cross-View Action Modeling, Learning, and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Loong Fah Cheong,et al. Activity recognition using dense long-duration trajectories , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[23] ShaoLing,et al. Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition , 2014 .

[24] Marcel Körtgen,et al. 3 D Shape Matching with 3 D Shape Contexts , 2003 .

[25] Kangkan Wang,et al. Templateless Non-Rigid Reconstruction and Motion Tracking With a Single RGB-D Camera. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[26] Hubert P. H. Shum,et al. Multi‐layer Lattice Model for Real‐Time Dynamic Character Deformation , 2015, Comput. Graph. Forum.

[27] Ajmal S. Mian,et al. Learning a non-linear knowledge transfer model for cross-view action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Cordelia Schmid,et al. Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[29] Larry S. Davis,et al. Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30] Ruonan Li,et al. Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32] James J. Little,et al. 3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Hong Liu,et al. Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[34] Mubarak Shah,et al. Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories , 2011, 2011 International Conference on Computer Vision.

[35] Marcel Körtgen,et al. 3D Shape Matching with 3D Shape Contexts , 2003 .

[36] Ling Shao,et al. Arbitrary view action recognition via transfer dictionary learning on synthetic training data , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[37] Gene H. Golub,et al. Tikhonov Regularization and Total Least Squares , 1999, SIAM J. Matrix Anal. Appl..

[38] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[39] Ralph Gross,et al. The CMU Motion of Body (MoBo) Database , 2001 .

[40] Christopher Joseph Pal,et al. Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41] Edward K. Wong,et al. Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition , 2016, Image Vis. Comput..

[42] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[43] Jintao Li,et al. Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[44] Guillermo Sapiro,et al. Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations , 2009, NIPS.

[45] Mubarak Shah,et al. A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[46] Hong Liu,et al. Energy-Based Global Ternary Image for Action Recognition Using Sole Depth Sequences , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[47] Yun Fu,et al. Discriminative Relational Representation Learning for RGB-D Action Recognition , 2016, IEEE Transactions on Image Processing.

[48] Allen Y. Yang,et al. Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49] Thierry Dutoit,et al. A DATABASE FOR STYLISTIC HUMAN GAIT MODELING AND SYNTHESIS , 2008 .

[50] Rémi Ronfard,et al. Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[51] A. Bruckstein,et al. K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[52] Ali Farhadi,et al. Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[53] Ioannis Pitas,et al. The i3DPost Multi-View and 3D Human Action/Interaction Database , 2009, 2009 Conference for Visual Media Production.

[54] Mubarak Shah,et al. Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[55] Pinar Duygulu Sahin,et al. A new pose-based representation for recognizing actions from multiple cameras , 2011, Comput. Vis. Image Underst..

[56] Ramakant Nevatia,et al. Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[57] Martial Hebert,et al. Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[58] Rama Chellappa,et al. Cross-View Action Recognition via Transferable Dictionary Learning , 2016, IEEE Transactions on Image Processing.

[59] Rama Chellappa,et al. Domain Adaptive Dictionary Learning , 2012, ECCV.

[60] Hans-Peter Kriegel,et al. 3D Shape Histograms for Similarity Search and Classification in Spatial Databases , 1999, SSD.

[61] Chunheng Wang,et al. Cross-View Action Recognition via a Continuous Virtual Path , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[62] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[63] Ioannis Pitas,et al. 3D Human Action Recognition for Multi-view Camera Systems , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[64] Ling Shao,et al. Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition , 2014, International Journal of Computer Vision.

[65] Isaac Cohen,et al. Inference of human postures by classification of 3D human body shape , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[66] Augusto Sarti,et al. 3-D Body Posture Tracking For Human Action Template Matching , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[67] Adriana Kovashka,et al. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68] Ajmal Mian,et al. Learning a Deep Model for Human Action Recognition from Novel Viewpoints , 2016 .

[69] Chengcheng Jia,et al. Low-Rank Tensor Subspace Learning for RGB-D Action Recognition. , 2016 .

[70] Silvio Savarese,et al. Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.