On Space-Time Interest Points

Local image features or interest points provide compact and abstract representations of patterns in an image. In this paper, we extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features often reflect interesting events that can be used for a compact representation of video data as well as for interpretation of spatio-temporal events.To detect spatio-temporal events, we build on the idea of the Harris and Förstner interest point operators and detect local structures in space-time where the image values have significant local variations in both space and time. We estimate the spatio-temporal extents of the detected events by maximizing a normalized spatio-temporal Laplacian operator over spatial and temporal scales. To represent the detected events, we then compute local, spatio-temporal, scale-invariant N-jets and classify each event with respect to its jet descriptor. For the problem of human motion analysis, we illustrate how a video representation in terms of local space-time features allows for detection of walking people in scenes with occlusions and dynamic cluttered backgrounds.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[3]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[4]  Richard G. Kurial,et al.  Representation and recognition , 1990 .

[5]  Johan Wiklund,et al.  Multidimensional Orientation Estimation with Applications to Texture Analysis and Optical Flow , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Andrea J. van Doorn,et al.  Generic Neighborhood Operators , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Edward H. Adelson,et al.  Analyzing and recognizing walking figures in XYT , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[9]  Stephen M. Smith,et al.  ASSET-2: Real-Time Motion Segmentation and Shape Tracking , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Sourabh A. Niyogi,et al.  Detecting kinetic occlusion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11]  David C. Hogg,et al.  Generating Spatiotemporal Models from Examples , 1995, BMVC.

[12]  Tony Lindeberg,et al.  Scale-Space with Casual Time Direction , 1996, ECCV.

[13]  T. Lindeberg Scale-space with Causal Time Direction , 1996 .

[14]  Luc Florack,et al.  Image Structure , 1997, Computational Imaging and Vision.

[15]  David J. Fleet,et al.  Learning parameterized models of image motion , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Tony Lindeberg,et al.  On Automatic Selection of Temporal Scales in Time-Causal Scale-Space , 1997, AFPAC.

[18]  Tony Lindeberg,et al.  Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure , 1997, Image Vis. Comput..

[19]  David J. Fleet,et al.  Motion feature detection using steerable flow fields , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[20]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[21]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[22]  Lars Bretzner,et al.  Feature Tracking with Automatic Selection of Spatial Scales , 1998, Comput. Vis. Image Underst..

[23]  Jitendra Malik,et al.  Textons, contours and regions: cue integration in image segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[25]  Tony Lindeberg,et al.  Fingerprint enhancement by shape adaptation of scale-space operators with automatic scale selection , 2000, IEEE Trans. Image Process..

[26]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[27]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[28]  James L. Crowley,et al.  A Probabilistic Sensor for the Perception and Recognition of Activities , 2000, ECCV.

[29]  Jesse Hoey,et al.  Representation and recognition of complex human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[30]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[31]  James L. Crowley,et al.  Object Recognition Using Coloured Receptive Fields , 2000, ECCV.

[32]  James L. Crowley,et al.  Local Scale Selection for Gaussian Based Description Techniques , 2000, ECCV.

[33]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[34]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[35]  Tony Lindeberg,et al.  Time-Recursive Velocity-Adapted Spatio-Temporal Scale-Space Filters , 2002, ECCV.

[36]  Stefan Carlsson,et al.  Combining Appearance and Topology for Wide Baseline Matching , 2002, ECCV.

[37]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[38]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[39]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[40]  Barbara Caputo,et al.  Recognition with local features: the kernel recipe , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41]  Lars Bretzner,et al.  Real-Time Scale Selection in Hybrid Multi-scale Representations , 2003, Scale-Space.

[42]  Ivan Laptev,et al.  Interest Point Detection and Scale Selection in Space-Time , 2003, Scale-Space.

[43]  J. Andrade-Cetto Object Recognition , 2003 .

[44]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[45]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[46]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[47]  Ivan Laptev,et al.  Velocity adaptation of spatio-temporal receptive fields for direct recognition of activities: an experimental study , 2004, Image Vis. Comput..

[48]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[49]  T. Lindeberg,et al.  Velocity adaptation of space-time interest points , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[50]  T. Lindeberg,et al.  Galilean-corrected spatio-temporal interest operators , 2004 .

[51]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[52]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[53]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[54]  Tony Lindeberg,et al.  Direct computation of shape cues using scale-adapted spatial derivative operators , 1996, International Journal of Computer Vision.

[55]  Ivan Laptev,et al.  Velocity adaptation of space-time interest points , 2004, ICPR 2004.

[56]  J. J. Koenderink,et al.  Scale-time , 1988, Biological Cybernetics.

[57]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[58]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.