Detecting, segmenting and tracking unknown objects using multi-label MRF inference

This article presents a unified framework for detecting, segmenting and tracking unknown objects in everyday scenes, allowing for inspection of object hypotheses during interaction over time. A heterogeneous scene representation is proposed, with background regions modeled as a combinations of planar surfaces and uniform clutter, and foreground objects as 3D ellipsoids. Recent energy minimization methods based on loopy belief propagation, tree-reweighted message passing and graph cuts are studied for the purpose of multi-object segmentation and benchmarked in terms of segmentation quality, as well as computational speed and how easily methods can be adapted for parallel processing. One conclusion is that the choice of energy minimization method is less important than the way scenes are modeled. Proximities are more valuable for segmentation than similarity in colors, while the benefit of 3D information is limited. It is also shown through practical experiments that, with implementations on GPUs, multi-object segmentation and tracking using state-of-art MRF inference methods is feasible, despite the computational costs typically associated with such methods.

[1]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jian Sun,et al.  Lazy snapping , 2004, SIGGRAPH 2004.

[4]  Stephen M. Smith,et al.  Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm , 2001, IEEE Transactions on Medical Imaging.

[5]  V. Caselles,et al.  A geometric model for active contours in image processing , 1993 .

[6]  Stanley T. Birchfield,et al.  Adaptive fragments-based tracking of non-rigid objects using level sets , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Narendra Ahuja,et al.  A constant-space belief propagation algorithm for stereo matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Christopher V. Alvino,et al.  The Piecewise Smooth Mumford–Shah Functional on an Arbitrary Graph , 2009, IEEE Transactions on Image Processing.

[9]  Donald Geman,et al.  Boundary Detection by Constrained Optimization , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[11]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Danica Kragic,et al.  Interactive object classification using sensorimotor contingencies , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[14]  W. Clem Karl,et al.  Real-time tracking using level sets , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Miao Liao,et al.  Real-time Global Stereo Matching Using Hierarchical Belief Propagation , 2006, BMVC.

[16]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Michal Irani,et al.  What Is a Good Image Segment? A Unified Approach to Segment Extraction , 2008, ECCV.

[18]  Tao Zhang,et al.  Active contours for tracking distributions , 2004, IEEE Transactions on Image Processing.

[19]  Ian D. Reid,et al.  Robust Real-Time Visual Tracking Using Pixel-Wise Posteriors , 2008, ECCV.

[20]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[21]  D. Mumford,et al.  Optimal approximations by piecewise smooth functions and associated variational problems , 1989 .

[22]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[24]  Nicholas Roy,et al.  Visual Segmentation of “Simple” Objects for Robots , 2012 .

[25]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[26]  Alan Brunton,et al.  Belief Propagation on the GPU for Stereo Vision , 2006, The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).

[27]  Loong Fah Cheong,et al.  Active segmentation with fixation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Giorgio Metta,et al.  Grounding vision through experimental manipulation , 2003, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[29]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[30]  Vladimir Kolmogorov,et al.  Interactive Foreground Extraction using graph cut , 2011 .

[31]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[32]  Tony F. Chan,et al.  Level set based shape prior segmentation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[34]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[35]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[36]  Nuno Vasconcelos,et al.  Empirical Bayesian EM-based motion segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[38]  Tony F. Chan,et al.  Active contours without edges , 2001, IEEE Trans. Image Process..

[39]  James J. Clark,et al.  Modal Control Of An Attentive Vision System , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[40]  Daniel Cremers,et al.  Diffusion Snakes: Introducing Statistical Shape Knowledge into the Mumford-Shah Functional , 2002, International Journal of Computer Vision.

[41]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[42]  Gert Kootstra,et al.  Using Symmetry to Select Fixation Points for Segmentation , 2010, 2010 20th International Conference on Pattern Recognition.

[43]  Nikos Komodakis,et al.  Fast, Approximately Optimal Solutions for Single and Dynamic MRFs , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  William T. Freeman,et al.  Learning low-level vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[45]  Hans-Hellmut Nagel,et al.  Model-based object tracking in monocular image sequences of road traffic scenes , 1993, International Journal of Computer 11263on.

[46]  Luc Van Gool,et al.  Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[48]  Danica Kragic,et al.  Generating object hypotheses in natural scenes through human-robot interaction , 2011, IROS 2011.

[49]  Patrick Pérez,et al.  Interactive Image Segmentation Using an Adaptive GMMRF Model , 2004, ECCV.

[50]  Susan L. Franzel,et al.  Guided search: an alternative to the feature integration model for visual search. , 1989, Journal of experimental psychology. Human perception and performance.

[51]  David J. Fleet,et al.  Model-based hand tracking with texture, shading and self-occlusions , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[53]  Dariush Divsalar,et al.  Soft-Output Decoding Algorithms in Iterative Decoding of Turbo Codes , 1996 .

[54]  R. B. Potts Some generalized order-disorder transformations , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.

[55]  J. Mixter Fast , 2012 .

[56]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[57]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[58]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[59]  William T. Freeman,et al.  Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[60]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[62]  P. J. Narayanan,et al.  CUDA cuts: Fast graph cuts on the GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[63]  Hui Chen,et al.  Belief Propagation Implementation Using CUDA on an NVIDIA GTX 280 , 2009, Australasian Conference on Artificial Intelligence.

[64]  Ingemar J. Cox,et al.  A maximum-flow formulation of the N-camera stereo correspondence problem , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[65]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956 .

[66]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[67]  Danica Kragic,et al.  Active 3D Segmentation through Fixation of Previously Unseen Objects , 2010, BMVC.

[68]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[70]  Liang-Gee Chen,et al.  Hardware-Efficient Belief Propagation , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[71]  Martin J. Wainwright,et al.  MAP estimation via agreement on (hyper)trees: Message-passing and linear programming , 2005, ArXiv.

[72]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[73]  Daniel Cremers,et al.  Dynamical statistical shape priors for level set-based tracking , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[76]  Danica Kragic,et al.  Enhancing visual perception of shape through tactile glances , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[77]  Vladimir Kolmogorov,et al.  Joint optimization of segmentation and appearance models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[78]  Jan-Olof Eklundh,et al.  Vision in the real world: Finding, attending and recognizing objects , 2006, Int. J. Imaging Syst. Technol..

[79]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..