Human Recognition in RGBD Combining Object Detectors and Conditional Random Fields

This paper addresses the problem of detecting and segmenting human instances in a point cloud. Both fields have been well studied during the last decades showing impressive results, not only in accuracy but also in computational performance. With the rapid use of depth sensors, a resurgent need for improving existing state-of-the-art algorithms, integrating depth information as an additional constraint became more ostensible. Current challenges involve combining RGB and depth information for reasoning about location and spatial extend of the object of interest. We make use of an improved deformable part model algorithm, allowing to deform the individual parts across multiple scales, approximating the location of the person in the scene and a conditional random field energy function for specifying the object’s spatial extent. Our proposed energy function models up to pairwise relations defined in the RGBD domain, enforcing label consistency for regions sharing similar unary and pairwise measurements. Experimental results show that our proposed energy func- tion provides a fairly precise segmentation even when the resulting detection box is imprecise. Reasoning about the detection algorithm could potentially enhance the quality of the detection box allowing capturing the object of interest as a whole.

[1]  Sebastian Thrun,et al.  Learning to Segment and Track in RGBD , 2012, WAFR.

[2]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[3]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[4]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[5]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[6]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[7]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[8]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[9]  Vibhav Vineet,et al.  Human Instance Segmentation from Video using Detector-based Conditional Random Fields , 2011, BMVC.

[10]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Dieter Fox,et al.  Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[14]  Ronny Hänsch Generic object categorization in PolSAR images - and beyond , 2014 .

[15]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[16]  Afshin Dehghan,et al.  Improving an Object Detector and Extracting Regions Using Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  François Fleuret,et al.  Deformable Part Models with Individual Part Scaling , 2013, BMVC.