Human Pose Estimation for Multiple Persons Based on Volume Reconstruction

Most of the development of pose recognition focused on a single person. However, many applications of computer vision essentially require the estimation of multiple people. Hence, in this paper, we address the problems of estimating poses of multiple persons using volumes estimated from multiple cameras. One of the main issues that causes the multiple person from multiple cameras to be problematic is the present of ‘ghost’ volumes. This problem arises when the projections of two different silhouettes of two different persons onto the 3D world overlap in a place where in fact there is no person in it. To solve this problem, we first introduce a novel principal axis-based framework to estimate the 3D ground plane positions of multiple people, and then use the position cues to label the multi-person volumes (voxels), while considering the voxel connectivity. Having labeled the voxels, we fit the volume of each person with a body model, and determine the pose of the person based on the model. The results on real videos demonstrate the accuracy and efficiency of our approach.