Cross-View Action Recognition Based on a Statistical Translation Framework

Actions captured under view changes pose serious challenges to modern action recognition methods. In this paper, we propose an effective approach for cross-view action recognition based on a statistical translation framework, which boils down to estimation of visual word transfer probabilities across views. Specifically, local features are extracted from action video frames and form bags of words based on k-means clustering. Though the appearance of an action may vary due to view changes, the underlying transfer tendency between visual words across views can be exploited. We propose two methods to measure the visual-word-based transfer relationship that are eventually based on frequency counts of word pairs. In the first method, word transfer probabilities are estimated by maximizing the likelihood of a shared action set with the EM algorithm. In the second method, word transfer probabilities are estimated by using likelihood-ratio tests. The two methods achieve comparable results and perform better when they are combined. For cross-view action classification, we compute action transfer probabilities based on the estimated word transfer probabilities and then implement a K-NN-like classification based on action video transfer probabilities. We verified our method on the public multiview IXMAS dataset and the WVU dataset. Promising results are obtained compared with state-of-the-art methods.

[1]  Chunheng Wang,et al.  Cross-View Action Recognition via a Continuous Virtual Path , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[3]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Song-Chun Zhu,et al.  Learning AND-OR Templates for Object Recognition and Detection , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[7]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[8]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  I. Patras,et al.  Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ying Wu,et al.  Cross-View Action Modeling, Learning, and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[13]  Larry S. Davis,et al.  Representing Videos Using Mid-level Discriminative Patches , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[15]  Yiannis Aloimonos,et al.  View-Invariant Modeling and Recognition of Human Actions Using Grammars , 2006, WDV.

[16]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Wang Chuanx On Temporal Order Invariance for View-invariant Action Recognition , 2015 .

[18]  Ying Wu,et al.  Action recognition with multiscale spatio-temporal contexts , 2011, CVPR 2011.

[19]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[21]  Yang Wang,et al.  Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Thomas B. Moeslund,et al.  A selective spatio-temporal interest point detector for human action recognition in complex scenes , 2011, 2011 International Conference on Computer Vision.

[24]  乔宇 Motionlets: Mid-Level 3D Parts for Human Motion Recognition , 2013 .

[25]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[26]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[27]  Shaogang Gong,et al.  Fusing appearance and distribution information of interest points for action recognition , 2012, Pattern Recognit..

[28]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[29]  Robert C. Moore On Log-Likelihood-Ratios and the Significance of Rare Events , 2004, EMNLP.

[30]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Mohiuddin Ahmad,et al.  HMM-based Human Action Recognition Using Multiview Image Sequences , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[33]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[34]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[35]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Ruonan Li,et al.  Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Robert C. Moore Improving IBM Word Alignment Model 1 , 2004, ACL.

[38]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Yupin Luo,et al.  Making full use of spatial-temporal interest points: An AdaBoost approach for action recognition , 2010, 2010 IEEE International Conference on Image Processing.

[41]  Ramakant Sharma,et al.  Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood , 2003 .

[42]  Iasonas Kokkinos,et al.  Discovering discriminative action parts from mid-level video representations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[44]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[45]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[46]  Zhiwu Lu,et al.  Spectral learning of latent semantics for action recognition , 2011, 2011 International Conference on Computer Vision.

[47]  Jing Wang,et al.  Cross-View Action Recognition Based on Statistical Machine Translation , 2012, CCBR.

[48]  Tieniu Tan,et al.  A Discriminative Model of Motion and Cross Ratio for View-Invariant Action Recognition , 2012, IEEE Transactions on Image Processing.

[49]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[50]  Venu Govindaraju,et al.  A generative framework to investigate the underlying patterns in human activities , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[51]  Binlong Li,et al.  Activity recognition using dynamic subspace angles , 2011, CVPR 2011.

[52]  Ali Farhadi,et al.  A latent model of discriminative aspect , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[53]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[54]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[55]  Zhuowen Tu,et al.  Action Recognition with Actons , 2013, 2013 IEEE International Conference on Computer Vision.