3D action matching with key-pose detection

This paper addresses the problem of human action matching in outdoor sports broadcast environments, by analysing 3D data from a recorded human activity and retrieving the most appropriate proxy action from a motion capture library. Typically pose recognition is carried out using images from a single camera, however this approach is sensitive to occlusions and restricted fields of view, both of which are common in the outdoor sports environment. This paper presents a novel technique for the automatic matching of human activities which operates on the 3D data available in a multi-camera broadcast environment. Shape is retrieved using multi-camera techniques to generate a 3D representation of the scene. Use of 3D data renders the system camera-pose-invariant and allows it to work while cameras are moving and zooming. By comparing the reconstructions to an appropriate 3D library, action matching can be achieved in the presence of significant calibration and matting errors which cause traditional pose detection schemes to fail. An appropriate feature descriptor and distance metric are presented as well as a technique to use these features for key-pose detection and action matching. The technique is then applied to real footage captured at an outdoor sporting event.

[1]  Adrian Hilton,et al.  Automatic 3D Video Summarization: Key Frame Extraction from Self-Similarity , 2008 .

[2]  Vincent Lepetit,et al.  Human body pose detection using Bayesian spatio-temporal templates , 2006, Comput. Vis. Image Underst..

[3]  Stefan Carlsson,et al.  Monocular 3D Reconstruction of Human Motion in Long Action Sequences , 2004, ECCV.

[4]  Takeo Kanade,et al.  Virtualized Reality: Constructing Virtual Worlds from Real Scenes , 1997, IEEE Multim..

[5]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Adrian Hilton,et al.  A Free-Viewpoint Video System for Visualization of Sport Scenes , 2007 .

[7]  Oliver Grau,et al.  Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events , 2007, Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007).

[8]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Jean-Yves Guillemaut,et al.  Robust graph-cut scene segmentation and reconstruction for free-viewpoint video of complex dynamic scenes , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Hans-Peter Kriegel,et al.  3D Shape Histograms for Similarity Search and Classification in Spatial Databases , 1999, SSD.

[12]  Michael J. Black,et al.  A Quantitative Evaluation of Video-based 3D Person Tracking , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[13]  Adrian Hilton,et al.  A Study of Shape Similarity for Temporal Surface Sequences of People , 2007, Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007).

[14]  Hideo Saito,et al.  Arbitrary viewpoint observation for soccer match video , 2004 .

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Luc Van Gool,et al.  Articulated Multi-body Tracking under Egomotion , 2008, ECCV.

[17]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[18]  Ian D. Reid,et al.  A Multiple View Layered Representation for Dynamic Novel View Synthesis , 2003, BMVC.

[19]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  James J. Little,et al.  Simultaneous Tracking and Action Recognition using the PCA-HOG Descriptor , 2006, The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).

[21]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Andrew E. Johnson,et al.  Spin-Images: A Representation for 3-D Surface Matching , 1997 .