Spatio-temporal Shape and Flow Correlation for Action Recognition

This paper explores the use of volumetric features for action recognition. First, we propose a novel method to correlate spatio-temporal shapes to video clips that have been automatically segmented. Our method works on over-segmented videos, which means that we do not require background subtraction for reliable object segmentation. Next, we discuss and demonstrate the complementary nature of shape- and flow-based features for action recognition. Our method, when combined with a recent flow-based correlation technique, can detect a wide range of actions in video, as demonstrated by results on a long tennis video. Although not specifically designed for whole-video classification, we also show that our method's performance is competitive with current action classification techniques on a standard video classification dataset.

[1]  Dorin Comaniciu,et al.  An Algorithm for Data-Driven Bandwidth Selection , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Ser Peow Tan,et al.  Normalizers of the congruence subgroups of the Hecke group II , 1999 .

[3]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[4]  David S. Doermann,et al.  Video retrieval of near-duplicates using κ-nearest neighbor retrieval of spatio-temporal descriptors , 2006, Multimedia Tools and Applications.

[5]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Yee Leung,et al.  Clustering by Scale-Space Filtering , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  M. Shah,et al.  Actions As Objects : A Novel Action Representation , 2005 .

[12]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Ze-Nian Li,et al.  Successive Convex Matching for Action Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16]  Bo Thiesson,et al.  Image and Video Segmentation by Anisotropic Kernel Mean Shift , 2004, ECCV.

[17]  Jitendra Malik,et al.  Shape Guided Object Segmentation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[19]  Greg Mori,et al.  Guiding model search using segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.