A Study on Sampling Strategies in Space-Time Domain for Recognition Applications

We investigate the relative strengths of existing space-time interest points in the context of action detection and recognition. The interest point operators evaluated are an extension of the Harris corner detector (Laptev et al. [1]), a space-time Gabor filter (Dollar et al. [2]), and randomized sampling on the motion boundaries. In the first level of experiments we study the low level attributes of interest points such as stability, repeatability and sparsity with respect to the sources of variations such as actors, viewpoint and action category. In the second level we measure the discriminative power of interest points by extracting generic region descriptors around the interest points (1. histogram of optical flow[3], 2. motion history images[4], 3. histograms of oriented gradients[3]). Then we build a simple action recognition scheme by constructing a dictionary of codewords and learning a recognition system using the histograms of these codewords. We demonstrate that although there may be merits due to the structural information contained in the interest point detections, ultimately getting as many data samples as possible, even with random sampling, is the decisive factor in the interpretation of space-time data.

[1]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[3]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[4]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[6]  Nicu Sebe,et al.  Computer Vision in Human-Computer Interaction , 2004, Lecture Notes in Computer Science.

[7]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, CVPR.

[8]  Ivan Laptev,et al.  Local Descriptors for Spatio-temporal Recognition , 2004, SCVMA.

[9]  W. James MacLean Spatial Coherence for Visual Motion Analysis , 2006 .

[10]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[11]  Axel Pinz,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[12]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[13]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[14]  Jake K. Aggarwal,et al.  Human motion analysis: a review , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[15]  E. Nowak,et al.  Vehicle Categorization: Parts for Speed and Accuracy , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.