Tracking self-occluding articulated objects in dense disparity maps

In this paper, we present an algorithm for real-time tracking of articulated structures in dense disparity maps derived from stereo image sequences. A statistical image formation model that accounts for occlusions plays the central role in our tracking approach. This graphical model (a Bayesian network) assumes that the range image of each part of the structure is formed by drawing the depth candidates from a 3-D Gaussian distribution. The advantage over the classical mixture of Gaussians is that our model takes into account occlusions by picking the minimum depth (which could be regarded as a probabilistic version of z-buffering). The model also enforces articulation constraints among the parts of the structure. The tracking problem is formulated as an inference problem in the image formation model. This model can be extended and used for other tasks in addition to the one described in the paper and can also be used for estimating probability distribution functions instead of the ML estimates of the tracked parameters. For the purposes of real-time tracking, we used certain approximations in the inference process, which resulted in a real-time two-stage inference algorithm. We were able to successfully track upper human body motion in real time and in the presence of self-occlusions.

[1]  J. G. Gander,et al.  An introduction to signal detection and estimation , 1990 .

[2]  Alex Pentland,et al.  Dynamic models of human motion , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[3]  Takuya Kondo,et al.  Incremental tracking of human actions from multiple views , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[4]  Dimitris N. Metaxas,et al.  Shape and Nonrigid Motion Estimation Through Physics-Based Synthesis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Robert C. Bolles,et al.  Background modeling for segmentation of video-rate stereo sequences , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[7]  Kurt Konolige,et al.  Small Vision Systems: Hardware and Implementation , 1998 .

[8]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[10]  Alex Pentland,et al.  Real-time self-calibrating stereo person tracking using 3-D shape estimation from blob features , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[11]  Ioannis A. Kakadiaris,et al.  Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[13]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[14]  Olivier Faugeras,et al.  Three-Dimensional Computer Vision , 1993 .

[15]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.