论文信息 - Local Descriptors for Spatio-temporal Recognition

Local Descriptors for Spatio-temporal Recognition

This paper presents and investigates a set of local space-time descriptors for representing and recognizing motion patterns in video. Following the idea of local features in the spatial domain, we use the notion of space-time interest points and represent video data in terms of local space-time events. To describe such events, we define several types of image descriptors over local spatio-temporal neighborhoods and evaluate these descriptors in the context of recognizing human activities. In particular, we compare motion representations in terms of spatio-temporal jets, position dependent histograms, position independent histograms, and principal component analysis computed for either spatio-temporal gradients or optic flow. An experimental evaluation on a video database with human actions shows that high classification performance can be achieved, and that there is a clear advantage of using local position dependent histograms, consistent with previously reported findings regarding spatial recognition.

Ivan Laptev | Tony Lindeberg | T. Lindeberg | I. Laptev

[1] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2] Editors , 1986, Brain Research Bulletin.

[3] T. Lindeberg,et al. Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D structure , 1997 .

[4] Mubarak Shah,et al. Motion-Based Recognition , 1997, Computational Imaging and Vision.

[5] Tony Lindeberg,et al. Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure , 1997, Image Vis. Comput..

[6] Hans-Hellmut Nagel,et al. Spatiotemporally Adaptive Estimation and Segmenation of OF-Fields , 1998, ECCV.

[7] Bernd Neumann,et al. Computer Vision — ECCV’98 , 1998, Lecture Notes in Computer Science.

[8] Dariu Gavrila,et al. The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[9] Michael J. Black,et al. Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[10] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11] James L. Crowley,et al. A Probabilistic Sensor for the Perception and Recognition of Activities , 2000, ECCV.

[12] Jesse Hoey,et al. Representation and recognition of complex human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13] Christopher M. Bishop,et al. Non-linear Bayesian Image Modelling , 2000, ECCV.

[14] James W. Davis,et al. The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15] Lihi Zelnik-Manor,et al. Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16] Mads Nielsen,et al. Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[17] Tony Lindeberg,et al. Time-Recursive Velocity-Adapted Spatio-Temporal Scale-Space Filters , 2002, ECCV.

[18] Cordelia Schmid,et al. An Affine Invariant Interest Point Detector , 2002, ECCV.

[19] T. Lindeberg,et al. Velocity-adapted spatio-temporal receptive fields for direct recognition of activities , 2002 .

[20] Pietro Perona,et al. Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[21] Jitendra Malik,et al. Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22] Patrick Bouthemy,et al. Motion Recognition Using Nonparametric Image Motion Models Estimated from Temporal and Multiscale Cooccurrence Statistics , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24] J. Koenderink,et al. Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[25] Ivan Laptev,et al. Velocity adaptation of spatio-temporal receptive fields for direct recognition of activities: an experimental study , 2004, Image Vis. Comput..

[26] Yan Ke,et al. PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[27] Tony Lindeberg,et al. Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[28] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[29] Bernt Schiele,et al. Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[30] Ivan Laptev,et al. Velocity adaptation of space-time interest points , 2004, ICPR 2004.

[31] Michael J. Black,et al. EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[32] Cordelia Schmid,et al. A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..