Simultaneous pose recovery and camera registration from multiple views of a walking person

We present an algorithm to estimate the body pose of a walking person given synchronized video input from multiple uncalibrated cameras. We construct an appearance model of human walking motion by generating examples from the space of body poses and camera locations, and clustering them using expectation-maximization. Given a segmented input video sequence, we find the closest matching appearance cluster for each silhouette and use the sequence of matched clusters to extrapolate the position of the camera with respect to the person's direction of motion. For each frame, the matching cluster also provides an estimate of the walking phase. We combine these estimates from all views and find the most likely sequence of walking poses using a cyclical, feed-forward hidden Markov model. Our algorithm requires no manual initialization and no prior knowledge about the locations of the cameras.

[1]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[2]  Joshua N. Migdal,et al.  Robust motion segmentation using Markov thresholds , 2003 .

[3]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Kinh Tieu,et al.  Learning pedestrian models for silhouette refinement , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[8]  Mohan M. Trivedi,et al.  Articulated body posture estimation from multi-camera voxel data , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Rómer Rosales,et al.  Specialized mappings and the estimation of human body pose from a single image , 2000, Proceedings Workshop on Human Motion.

[11]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[12]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[13]  Ramakant Nevatia,et al.  Segmentation and tracking of multiple humans in complex situations , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[14]  James M. Rehg,et al.  Singularity analysis for articulated object tracking , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[15]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[16]  G. Johansson Visual motion perception. , 1975, Scientific American.

[17]  David G. Stork,et al.  Pattern Classification , 1973 .