Improved Spatio-temporal Salient Feature Detection for Action Recognition

Spatio-temporal salient features can localize the local motion events and are used to represent video sequences for many computer vision tasks such as action recognition. The robust detection of these features under geometric variations such as affine transformation and view/scale changes is however an open problem. Existing methods use the same filter for both time and space and hence, perform an isotropic temporal filtering. A novel anisotropic temporal filter for better spatio-temporal feature detection is developed. The effect of symmetry and causality of the video filtering is investigated. Based on the positive results of precision and reproducibility tests, we propose the use of temporally asymmetric filtering for robust motion feature detection and action recognition.

[1]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Eero P. Simoncelli,et al.  A model of neuronal responses in visual area MT , 1998, Vision Research.

[3]  R. Vidal,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  David A. Clausi,et al.  Towards a Robust Spatio-Temporal Interest Point Detection for Human Action Recognition , 2009, 2009 Canadian Conference on Computer and Robot Vision.

[6]  Luc Florack,et al.  Scale-Time Kernels and Models , 2001, Scale-Space.

[7]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[9]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[11]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[13]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[14]  Daniel Fagerström,et al.  Spatio-temporal Scale-Spaces , 2007, SSVM.

[15]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[16]  Barbara Caputo,et al.  Local velocity-adapted motion events for spatio-temporal recognition , 2007, Comput. Vis. Image Underst..

[17]  A J Ahumada,et al.  Model of human visual-motion sensing. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[18]  Wulfram Gerstner,et al.  Spiking Neuron Models , 2002 .

[19]  Bo Thiesson,et al.  Image and Video Segmentation by Anisotropic Kernel Mean Shift , 2004, ECCV.

[20]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  David Elliott,et al.  In the Wild , 2010 .

[22]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[23]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[24]  Mubarak Shah,et al.  Action recognition using spatio-temporal regularity based features , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  J. J. Koenderink,et al.  Scale-time , 1988, Biological Cybernetics.

[26]  Daniel Fagerström,et al.  Temporal Scale Spaces , 2003, International Journal of Computer Vision.

[27]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[28]  Tony Lindeberg,et al.  Scale-Space with Casual Time Direction , 1996, ECCV.

[29]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[30]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[31]  T. Lindeberg Scale-space with Causal Time Direction , 1996 .

[32]  David A. Clausi,et al.  Human Action Recognition Using Salient Opponent-Based Motion Features , 2010, 2010 Canadian Conference on Computer and Robot Vision.

[33]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Joachim Weickert,et al.  A Review of Nonlinear Diffusion Filtering , 1997, Scale-Space.

[35]  Bernhard Burgeth,et al.  Scale Spaces on Lie Groups , 2007, SSVM.

[36]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37]  S. Kollias,et al.  Dense saliency-based spatiotemporal feature points for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[39]  A. Hodgkin,et al.  Changes in time scale and sensitivity in the ommatidia of Limulus , 1964, The Journal of physiology.

[40]  Wulfram Gerstner,et al.  SPIKING NEURON MODELS Single Neurons , Populations , Plasticity , 2002 .