On Space-Time Interest Points

Local image features or interest points provide compact and abstract representations of patterns in an image. We propose to extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features often reflect interesting events that can be used for a compact representation of video data as well as for its interpretation. To detect spatio-temporal events, we build on the idea of the Harris and Forstner interest point operators and detect local structures in space-time where the image values have significant local variations in both space and time. We then estimate the spatio-temporal extents of the detected events and compute their scale-invariant spatio-temporal descriptors. Using such descriptors, we classify events and construct video representation in terms of labeled space-time points. For the problem of human motion analysis, we illustrate how the proposed method allows for detection of walking people in scenes with occlusions and dynamic backgrounds.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[3]  Johan Wiklund,et al.  Multidimensional Orientation Estimation with Applications to Texture Analysis and Optical Flow , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Andrea J. van Doorn,et al.  Generic Neighborhood Operators , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Edward H. Adelson,et al.  Analyzing and recognizing walking figures in XYT , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Sourabh A. Niyogi,et al.  Detecting kinetic occlusion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[7]  Stephen M. Smith,et al.  ASSET-2: real-time motion segmentation and shape tracking , 1995, Proceedings of IEEE International Conference on Computer Vision.

[8]  David C. Hogg,et al.  Generating Spatiotemporal Models from Examples , 1995, BMVC.

[9]  Tony Lindeberg,et al.  Scale-Space with Casual Time Direction , 1996, ECCV.

[10]  T. Lindeberg Scale-space with Causal Time Direction , 1996 .

[11]  Luc Florack,et al.  Image Structure , 1997, Computational Imaging and Vision.

[12]  David J. Fleet,et al.  Learning parameterized models of image motion , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Tony Lindeberg,et al.  On Automatic Selection of Temporal Scales in Time-Causal Scale-Space , 1997, AFPAC.

[15]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[16]  Tony Lindeberg,et al.  Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure , 1997, Image Vis. Comput..

[17]  David J. Fleet,et al.  Motion feature detection using steerable flow fields , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[18]  Hans-Hellmut Nagel,et al.  Spatiotemporally Adaptive Estimation and Segmenation of OF-Fields , 1998, ECCV.

[19]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[20]  Lars Bretzner,et al.  Feature Tracking with Automatic Selection of Spatial Scales , 1998, Comput. Vis. Image Underst..

[21]  Jitendra Malik,et al.  Textons, contours and regions: cue integration in image segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23]  Gerald Sommer,et al.  Algebraic Frames for the Perception-Action Cycle , 2000, Lecture Notes in Computer Science.

[24]  Tony Lindeberg,et al.  Fingerprint enhancement by shape adaptation of scale-space operators with automatic scale selection , 2000, IEEE Trans. Image Process..

[25]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[26]  James L. Crowley,et al.  A Probabilistic Sensor for the Perception and Recognition of Activities , 2000, ECCV.

[27]  Jesse Hoey,et al.  Representation and recognition of complex human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[28]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[29]  James L. Crowley,et al.  Object Recognition Using Coloured Receptive Fields , 2000, ECCV.

[30]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[31]  James L. Crowley,et al.  Local Scale Selection for Gaussian Based Description Techniques , 2000, ECCV.

[32]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[33]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[34]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[35]  Tony Lindeberg,et al.  Time-Recursive Velocity-Adapted Spatio-Temporal Scale-Space Filters , 2002, ECCV.

[36]  Stefan Carlsson,et al.  Combining Appearance and Topology for Wide Baseline Matching , 2002, ECCV.

[37]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[38]  Neil A. Dodgson,et al.  Proceedings Ninth IEEE International Conference on Computer Vision , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[39]  Barbara Caputo,et al.  Recognition with local features: the kernel recipe , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[40]  Lars Bretzner,et al.  Real-Time Scale Selection in Hybrid Multi-scale Representations , 2003, Scale-Space.

[41]  Ivan Laptev,et al.  Interest Point Detection and Scale Selection in Space-Time , 2003, Scale-Space.

[42]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[43]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[44]  Ivan Laptev,et al.  Velocity adaptation of spatio-temporal receptive fields for direct recognition of activities: an experimental study , 2004, Image Vis. Comput..

[45]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[46]  T. Lindeberg,et al.  Galilean-corrected spatio-temporal interest operators , 2004 .

[47]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[48]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[49]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[50]  Ivan Laptev,et al.  Velocity adaptation of space-time interest points , 2004, ICPR 2004.

[51]  J. J. Koenderink,et al.  Scale-time , 1988, Biological Cybernetics.

[52]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[53]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.