A unified spatio-temporal articulated model for tracking

Tracking articulated objects in image sequences remains a challenging problem, particularly in terms of the ability to localize the individual parts of an object given self-occlusions and changes in viewpoint. In this paper we propose a two-dimensional spatio-temporal modeling approach that handles both self-occlusions and changes in viewpoint. We use a Bayesian framework to combine pictorial structure spatial models with hidden Markov temporal models. Inference for these combined models can be performed using dynamic programming and sampling methods. We demonstrate the approach for the problem of tracking a walking person, using silhouette data taken from a single camera viewpoint. Walking provides both strong spatial (kinematic) and temporal (dynamic) constraints, enabling the method to track limb positions in spite of simultaneous self-occlusion and viewpoint change.

[1]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[4]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[7]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[8]  Robert T. Collins,et al.  Silhouette-based human identification from body shape and gait , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[9]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  Larry S. Davis,et al.  Learning dynamics for exemplar-based gesture recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  Björn Stenger,et al.  Filtering using a tree-based estimator , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.