Cross-View Action Recognition Based on Statistical Machine Translation

In this paper, we propose an approach for human action recognition from different views in a knowledge transfer framework. Each frame in an action is considered as a sentence in an article. We believe that, though the appearance for the same action is quite different in different views, there exists some translation relationship between them. To abstract the relationship, we use the IBM Model 1 in statistical machine translation and the translation probabilities for vocabularies in the source view to those in the target view can be obtained from the training data. Consequently, we can translate an action based on the maximum a posteriori criterion. We validated our method on the public multi-view IXMAS dataset and obtained promising results compared to the state-of-the-art knowledge transfer based methods.

[1]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[2]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Honghai Liu,et al.  Advances in View-Invariant Human Motion Analysis: A Review , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[6]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[7]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[8]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[10]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[11]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[12]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[13]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[14]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.