Actions as Space-Time Shapes

Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a recent approach [14] for analyzing 2D shapes and generalize it to deal with volumetric space-time action shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local space-time saliency, action dynamics, shape structure, and orientation. We show that these features are useful for action recognition, detection, and clustering. The method is fast, does not require video alignment, and is applicable in (but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method to partial occlusions, nonrigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action, and low-quality video.

[1]  Ramesh C. Jain,et al.  Invariant surface characteristics for 3D object recognition in range images , 1985, Comput. Vis. Graph. Image Process..

[2]  Ramakant Nevatia,et al.  Matching 3-D objects using surface descriptions , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[3]  Edward H. Adelson,et al.  Analyzing and recognizing walking figures in XYT , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Sven J. Dickinson,et al.  Recognition by functional parts [function-based object recognition] , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Azriel Rosenfeld,et al.  Recognition by Functional Parts , 1995, Comput. Vis. Image Underst..

[6]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[7]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Michael J. Black Explaining optical flow events with parameterized spatio-temporal models , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[9]  Stefan Carlsson,et al.  Order Structure, Correspondence, and Shape Based Categories , 1999, Shape, Contour and Grouping in Computer Vision.

[10]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[11]  James L. Crowley,et al.  A Probabilistic Sensor for the Perception and Recognition of Activities , 2000, ECCV.

[12]  James L. Crowley,et al.  A probabilistic sensor for the perception of activities , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[13]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[15]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  J. Sullivan,et al.  Action Recognition by Shape Matching to Key Frames , 2002 .

[17]  Philip N. Klein,et al.  Shock-Based Indexing into Large Shape Databases , 2002, ECCV.

[18]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Ali Shokoufandeh,et al.  Shock Graphs and Shape Matching , 1998, International Journal of Computer Vision.

[21]  Randal C. Nelson,et al.  Detection and Recognition of Periodic, Nonrigid Motion , 1997, International Journal of Computer Vision.

[22]  Steven M. Seitz,et al.  View-Invariant Analysis of Cyclic Motion , 1997, International Journal of Computer Vision.

[23]  Remco C. Veltkamp,et al.  A Survey of Content Based 3D Shape Retrieval Methods , 2004, SMI.

[24]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Roman Goldenberg,et al.  Behavior classification by eigendecomposition of periodic motions , 2005, Pattern Recognit..

[29]  Ronen Basri,et al.  Shape Representation and Classification Using the Poisson Equation , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Heiko Neumann,et al.  Sketching shiny surfaces: 3D shape extraction and depiction of specular surfaces , 2006, TAP.