Local Descriptors for Spatio-temporal Recognition

This paper presents and investigates a set of local space-time descriptors for representing and recognizing motion patterns in video. Following the idea of local features in the spatial domain, we use the notion of space-time interest points and represent video data in terms of local space-time events. To describe such events, we define several types of image descriptors over local spatio-temporal neighborhoods and evaluate these descriptors in the context of recognizing human activities. In particular, we compare motion representations in terms of spatio-temporal jets, position dependent histograms, position independent histograms, and principal component analysis computed for either spatio-temporal gradients or optic flow. An experimental evaluation on a video database with human actions shows that high classification performance can be achieved, and that there is a clear advantage of using local position dependent histograms, consistent with previously reported findings regarding spatial recognition.

[1]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2]  Editors , 1986, Brain Research Bulletin.

[3]  T. Lindeberg,et al.  Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D structure , 1997 .

[4]  Mubarak Shah,et al.  Motion-Based Recognition , 1997, Computational Imaging and Vision.

[5]  Tony Lindeberg,et al.  Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure , 1997, Image Vis. Comput..

[6]  Hans-Hellmut Nagel,et al.  Spatiotemporally Adaptive Estimation and Segmenation of OF-Fields , 1998, ECCV.

[7]  Bernd Neumann,et al.  Computer Vision — ECCV’98 , 1998, Lecture Notes in Computer Science.

[8]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[9]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[10]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  James L. Crowley,et al.  A Probabilistic Sensor for the Perception and Recognition of Activities , 2000, ECCV.

[12]  Jesse Hoey,et al.  Representation and recognition of complex human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Christopher M. Bishop,et al.  Non-linear Bayesian Image Modelling , 2000, ECCV.

[14]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[17]  Tony Lindeberg,et al.  Time-Recursive Velocity-Adapted Spatio-Temporal Scale-Space Filters , 2002, ECCV.

[18]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[19]  T. Lindeberg,et al.  Velocity-adapted spatio-temporal receptive fields for direct recognition of activities , 2002 .

[20]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[21]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Patrick Bouthemy,et al.  Motion Recognition Using Nonparametric Image Motion Models Estimated from Temporal and Multiscale Cooccurrence Statistics , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[25]  Ivan Laptev,et al.  Velocity adaptation of spatio-temporal receptive fields for direct recognition of activities: an experimental study , 2004, Image Vis. Comput..

[26]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[27]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[28]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[29]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[30]  Ivan Laptev,et al.  Velocity adaptation of space-time interest points , 2004, ICPR 2004.

[31]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[32]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..