Towards a Robust Spatio-Temporal Interest Point Detection for Human Action Recognition

Spatio-temporal salient features are widely being usedfor compact representation of objects and motions in video,especially for event and action recognition. The existingfeature extraction methods have two main problems: First,they work in batch mode and mostly use Gaussian (linear)scale-space filtering for multi-scale feature extraction. This linear filtering causes the blurring of the edges and salient motions which should be preserved for robust eature extraction.Second, the environmental motion and ego disturbances(e.g., camera shake) are not usually differentiated.These problems result in the detection of false features nomatter which saliency criteria is used. To address theseproblems, we developed a non-linear (scale-space) filteringapproach which prevents both spatial and temporal dislocations.This model can provide a non-linear counterpart ofthe Laplacian of Gaussian to form the conceptual structuremaps from which multi-scale spatio-temporal salient features are extracted. Preliminary evaluation shows promising result with false detection being removed.

[1]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[2]  Nuno Vasconcelos,et al.  On the plausibility of the discriminant center-surround hypothesis for visual saliency. , 2008, Journal of vision.

[3]  Hans-Peter Seidel,et al.  Image Compression with Anisotropic Diffusion , 2008, Journal of Mathematical Imaging and Vision.

[4]  I. Patras,et al.  Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[6]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[7]  Max A. Viergever,et al.  Linear Scale-Space Theory from Physical Principles , 1998, Journal of Mathematical Imaging and Vision.

[8]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[10]  Joachim Weickert,et al.  A Review of Nonlinear Diffusion Filtering , 1997, Scale-Space.

[11]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[12]  Guillermo Sapiro,et al.  Robust anisotropic diffusion , 1998, IEEE Trans. Image Process..

[13]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[15]  Hisashi Miyamori,et al.  Video annotation for content-based retrieval using human behavior analysis and domain knowledge , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[16]  Barbara Caputo,et al.  Local velocity-adapted motion events for spatio-temporal recognition , 2007, Comput. Vis. Image Underst..

[17]  Maja Pantic,et al.  Spatiotemporal saliency for human action recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.