Arbitrary view action recognition via transfer dictionary learning on synthetic training data

Human action recognition is an important problem in robotic vision. Traditional recognition algorithms usually require the knowledge of view angle, which is not always available in robotic applications such as active vision. In this paper, we propose a new framework to recognize actions with arbitrary views. A main feature of our algorithm is that view-invariance is learned from synthetic 2D and 3D training data using transfer dictionary learning. This guarantees the availability of training data, and removes the hassle of obtaining real world video in specific viewing angles. The result of the process is a dictionary that can project real world 2D video into a view-invariant sparse representation. This facilitates the training of a view-invariant classifier. Experimental results on the IXMAS and N-UCLA datasets show significant improvements over existing algorithms.

[1]  Chunheng Wang,et al.  Cross-View Action Recognition via a Continuous Virtual Path , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  James J. Little,et al.  3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[4]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[5]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[6]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Larry S. Davis,et al.  Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[8]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[9]  Gene H. Golub,et al.  Tikhonov Regularization and Total Least Squares , 1999, SIAM J. Matrix Anal. Appl..

[10]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Mubarak Shah,et al.  Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Gwenn Englebienne,et al.  Learning latent structure for activity recognition , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Ling Shao,et al.  Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition , 2014, International Journal of Computer Vision.

[15]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Takeo Kanade,et al.  Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Ying Wu,et al.  Cross-View Action Modeling, Learning, and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Alois Knoll,et al.  Action recognition using ensemble weighted multi-instance learning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Binlong Li,et al.  Cross-view activity recognition using Hankelets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.