Learning to Segment and Track in RGBD

We consider the problem of segmenting and tracking deformable objects in color video with depth (RGBD) data available from commodity sensors such as the Asus Xtion Pro Live or Microsoft Kinect. We frame this problem with very few assumptions-no prior object model, no stationary sensor, and no prior 3-D map-thus making a solution potentially useful for a large number of applications, including semi-supervised learning, 3-D model capture, and object recognition. Our approach makes use of a rich feature set, including local image appearance, depth discontinuities, optical flow, and surface normals to inform the segmentation decision in a conditional random field model. In contrast to previous work in this field, the proposed method learns how to best make use of these features from ground-truth segmented sequences. We provide qualitative and quantitative analyses which demonstrate substantial improvement over the state of the art. This paper is an extended version of our previous work. Building on our previous work, we show that it is possible to achieve an order of magnitude speedup and thus real-time performance ( ~ 20 FPS) on a laptop computer by applying simple algorithmic optimizations to the original work. This speedup comes at only a minor cost in overall accuracy and thus makes this approach applicable to a broader range of tasks. We demonstrate one such task: real-time, online, interactive segmentation to efficiently collect training data for an off-the-shelf object detector.

[1]  Pieter Peers,et al.  SubEdit: a representation for editing measured heterogeneous subsurface scattering , 2009, SIGGRAPH 2009.

[2]  Sebastian Thrun,et al.  Model based vehicle detection and tracking for autonomous urban driving , 2009, Auton. Robots.

[3]  Sebastian Thrun,et al.  Model Based Vehicle Tracking for Autonomous Driving in Urban Environments , 2008, Robotics: Science and Systems.

[4]  Sebastian Thrun,et al.  Towards 3D object recognition via classification of arbitrary object tracks , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[6]  Ignas Budvytis,et al.  Semi-supervised video segmentation using tree structured graphical models , 2011, CVPR.

[7]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[9]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[10]  Jiri Matas,et al.  P-N learning: Bootstrapping binary classifiers by structural constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Jitendra Malik,et al.  Tracking as Repeated Figure/Ground Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[13]  Luc Van Gool,et al.  Beyond semi-supervised tracking: Tracking should be as simple as detection, but not simpler than recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[14]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, SIGGRAPH 2009.

[15]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16]  Ian D. Reid,et al.  Robust Real-Time Visual Tracking Using Pixel-Wise Posteriors , 2008, ECCV.

[17]  Ian D. Reid,et al.  PWP3D: Real-time Segmentation and Tracking of 3D Objects , 2009, BMVC.

[18]  Horst Bischof,et al.  Hough-based tracking of non-rigid objects , 2011, 2011 International Conference on Computer Vision.

[19]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[20]  Sebastian Thrun,et al.  Tracking-based semi-supervised learning , 2011, Int. J. Robotics Res..

[21]  Avinash C. Kak,et al.  Computer Vision and Pattern Recognition 2010 A Probabilistic Framework for Joint Segmentation and Tracking , 2022 .

[22]  Horst Bischof,et al.  Real-Time Tracking via On-line Boosting , 2006, BMVC.

[23]  James M. Rehg,et al.  Motion Coherent Tracking with Multi-label MRF optimization , 2010, BMVC.

[24]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[25]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[26]  Hans-Joachim Wünsche,et al.  Monocular model-based 3D vehicle tracking for autonomous vehicles in unstructured environment , 2011, 2011 IEEE International Conference on Robotics and Automation.

[27]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Junseok Kwon,et al.  Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive Basin Hopping Monte Carlo sampling , 2009, CVPR.

[29]  Ian D. Reid,et al.  Real-time tracking of multiple occluding objects using level sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[31]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[32]  Vincent Lepetit,et al.  Fast Keypoint Recognition in Ten Lines of Code , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.