View-Independent Action Recognition from Temporal Self-Similarities

This paper addresses recognition of human actions under view changes. We explore self-similarities of action sequences over time and observe the striking stability of such measures across views. Building upon this key observation, we develop an action descriptor that captures the structure of temporal similarities and dissimilarities within an action sequence. Despite this temporal self-similarity descriptor not being strictly view-invariant, we provide intuition and experimental validation demonstrating its high stability under view changes. Self-similarity descriptors are also shown to be stable under performance variations within a class of actions when individual speed fluctuations are ignored. If required, such fluctuations between two different instances of the same action class can be explicitly recovered with dynamic time warping, as will be demonstrated, to achieve cross-view action synchronization. More central to the current work, temporal ordering of local self-similarity descriptors can simply be ignored within a bag-of-features type of approach. Sufficient action discrimination is still retained in this way to build a view-independent action recognition system. Interestingly, self-similarities computed from different image features possess similar properties and can be used in a complementary fashion. Our method is simple and requires neither structure recovery nor multiview correspondence estimation. Instead, it relies on weak geometric properties and combines them with machine learning for efficient cross-view action recognition. The method is validated on three public data sets. It has similar or superior performance compared to related methods and it performs well even in extreme conditions, such as when recognizing actions from top views while using side views only for training.

[1]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2]  D. Ruelle,et al.  Recurrence Plots of Dynamical Systems , 1987 .

[3]  S. Lele Euclidean Distance Matrix Analysis (EDMA): Estimation of mean form and mean form difference , 1993 .

[4]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[5]  G. McGuire,et al.  Recurrence matrices and the preservation of dynamical properties , 1997 .

[6]  E. Bradley,et al.  Recurrence plots of experimental data: To embed or not to embed? , 1998, Chaos.

[7]  Xavier Binefa,et al.  Robust Real-Time Periodic Motion Detection, Analysis, and Applications , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Stefan Carlsson Recognizing Walking People , 2000, ECCV.

[9]  M. Alex O. Vasilescu,et al.  Recognizing action events from multiple viewpoints , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[10]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  E. Bradley,et al.  Recurrence plots and unstable periodic orbits. , 2002, Chaos.

[12]  Joaquim Marques de Sá,et al.  Applied Statistics Using SPSS, STATISTICA, MATLAB and R , 2003 .

[13]  Tieniu Tan,et al.  Recent developments in human motion analysis , 2003, Pattern Recognit..

[14]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[16]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[17]  Larry S. Davis,et al.  Gait Recognition Using Image Self-Similarity , 2004, EURASIP J. Adv. Signal Process..

[18]  B. Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[19]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[23]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[24]  Mubarak Shah,et al.  Recognizing human actions in videos acquired by uncalibrated moving cameras , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Mohiuddin Ahmad,et al.  HMM-based Human Action Recognition Using Multiview Image Sequences , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[29]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[30]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[31]  Yiannis Aloimonos,et al.  View-Invariant Modeling and Recognition of Human Actions Using Grammars , 2006, WDV.

[32]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[33]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[35]  Jürgen Kurths,et al.  Recurrence plots for the analysis of complex systems , 2009 .

[36]  Liang Wang,et al.  Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Pinar Duygulu Sahin,et al.  Human Action Recognition Using Distribution of Oriented Rectangular Patches , 2007, Workshop on Human Motion.

[39]  Barbara Caputo,et al.  Local velocity-adapted motion events for spatio-temporal recognition , 2007, Comput. Vis. Image Underst..

[40]  Rui Li,et al.  Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[41]  Cordelia Schmid,et al.  Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.

[42]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[43]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Dit-Yan Yeung,et al.  Human action recognition using Local Spatio-Temporal Discriminant Embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Andrew Gilbert,et al.  Scale Invariant Action Recognition Using Compound Features Mined from Dense Spatio-temporal Corners , 2008, ECCV.

[47]  Hassan Foroosh,et al.  View-invariant action recognition using fundamental ratios , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Franziska Meier,et al.  3D Shape Context and Distance Transform for action recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[49]  Edmond Boyer,et al.  Action recognition using exemplar-based embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[51]  Mubarak Shah,et al.  Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Ramakant Nevatia,et al.  View and scale invariant action recognition using multiview shape-flow models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.