A Robust Stereo Prior for Human Segmentation

The emergence of affordable depth cameras has enabled significant advances in human segmentation and pose estimation in recent years. While it leads to impressive results in many tasks, the use of infra-red cameras have their drawbacks, in particular the fact that they don't work in direct sunlight. One alternative is to use a stereo pair of cameras to produce a disparity space image. In this work, we propose a robust method of using a disparity space image to create a prior for human segmentation. This new prior leads to greatly improved segmentation results; it can be applied to any task where a stereo pair of cameras is available, and segmentation results are desired. As an application, we show how the prior can be inserted into a dual decomposition formulation for stereo, segmentation and human pose estimation.

[1]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[3]  Dima Damen,et al.  Detecting Carried Objects in Short Video Sequences , 2008, ECCV.

[4]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[6]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Andrew Zisserman,et al.  OBJCUT: Efficient Segmentation Using Top-Down and Bottom-Up Cues , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Axel Pinz,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[9]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[10]  Michael T. Orchard,et al.  Color quantization of images , 1991, IEEE Trans. Signal Process..

[11]  Philip H. S. Torr,et al.  Simultaneous Human Segmentation, Depth and Pose Estimation via Dual Decomposition , 2012, BMVC 2012.

[12]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[13]  Pushmeet Kohli,et al.  PoseCut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph-Cuts , 2006, ECCV.

[14]  Philip H. S. Torr,et al.  Improved Moves for Truncated Convex Models , 2008, J. Mach. Learn. Res..

[15]  Andrew Blake,et al.  Bi-layer segmentation of binocular stereo video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Andrew Zisserman,et al.  Humanising GrabCut: Learning to segment humans using the Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[17]  Andrew Blake,et al.  Efficient Dense Stereo with Occlusions for New View-Synthesis by Four-State Dynamic Programming , 2006, International Journal of Computer Vision.

[18]  Frédéric Jurie,et al.  Combining appearance models and Markov Random Fields for category level object segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.