Cross-View Action Recognition from Temporal Self-similarities

This paper concerns recognition of human actions under view changes. We explore self-similarities of action sequences over time and observe the striking stability of such measures across views. Building upon this key observation we develop an action descriptor that captures the structure of temporal similarities and dissimilarities within an action sequence. Despite this descriptor not being strictly view-invariant, we provide intuition and experimental validation demonstrating the high stability of self-similarities under view changes. Self-similarity descriptors are also shown stable under action variations within a class as well as discriminative for action recognition. Interestingly, self-similarities computed from different image features possess similar properties and can be used in a complementary fashion. Our method is simple and requires neither structure recovery nor multi-view correspondence estimation. Instead, it relies on weak geometric properties and combines them with machine learning for efficient cross-view action recognition. The method is validated on three public datasets, it has similar or superior performance compared to related methods and it performs well even in extreme conditions such as when recognizing actions from top views while using side views for training only.

[1]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2]  S. Lele Euclidean Distance Matrix Analysis (EDMA): Estimation of mean form and mean form difference , 1993 .

[3]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[4]  Xavier Binefa,et al.  Robust Real-Time Periodic Motion Detection, Analysis, and Applications , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Mubarak Shah,et al.  A View-Invariant Representation of Human Action , 2000 .

[6]  Stefan Carlsson Recognizing Walking People , 2000, ECCV.

[7]  M. Alex O. Vasilescu,et al.  Recognizing action events from multiple viewpoints , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[8]  Tieniu Tan,et al.  Recent developments in human motion analysis , 2003, Pattern Recognit..

[9]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[10]  Larry S. Davis,et al.  Gait Recognition Using Image Self-Similarity , 2004, EURASIP J. Adv. Signal Process..

[11]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Mubarak Shah,et al.  Recognizing human actions in videos acquired by uncalibrated moving cameras , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[17]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[18]  Yiannis Aloimonos,et al.  View-Invariant Modeling and Recognition of Human Actions Using Grammars , 2006, WDV.

[19]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[20]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Rama Chellappa,et al.  Epitomic Representation of Human Activities , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Pinar Duygulu Sahin,et al.  Human Action Recognition Using Distribution of Oriented Rectangular Patches , 2007, Workshop on Human Motion.

[25]  Barbara Caputo,et al.  Local velocity-adapted motion events for spatio-temporal recognition , 2007, Comput. Vis. Image Underst..

[26]  Rui Li,et al.  Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Cordelia Schmid,et al.  Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.

[28]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[29]  Hassan Foroosh,et al.  View-invariant action recognition using fundamental ratios , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.