Object Segmentation Ensuring Consistency Across Multi-Viewpoint Images

We present a hybrid approach that segments an object by using both color and depth information obtained from views captured from a low-cost RGBD camera and sparsely-located color cameras. Our system begins with generating dense depth information of each target image by using Structure from Motion and Joint Bilateral Upsampling. We formulate the multi-view object segmentation as the Markov Random Field energy optimization on the graph constructed from the superpixels. To ensure inter-view consistency of the segmentation results between color images that have too few color features, our local mapping method generates dense inter-view geometric correspondences by using the dense depth images. Finally, the pixel-based optimization step refines the boundaries of the results obtained from the superpixel-based binary segmentation. We evaluate the validity of our method under various capture conditions such as numbers of views, rotations, and distances between cameras. We compared our method with the state-of-the-art methods that use the standard multi-view datasets. The comparison verified that the proposed method works very efficiently especially in a sparse wide-baseline capture environment.

[1]  Bernt Schiele,et al.  Video Segmentation with Superpixels , 2012, ACCV.

[2]  Jean-Yves Guillemaut,et al.  Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications , 2011, International Journal of Computer Vision.

[3]  Michael G. Strintzis,et al.  3-D model-based segmentation of videoconference image sequences , 1998, IEEE Trans. Circuits Syst. Video Technol..

[4]  Woontack Woo,et al.  Silhouette Segmentation in Multiple Views , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Roberto Cipolla,et al.  Automatic Object Segmentation from Calibrated Images , 2011, 2011 Conference for Visual Media Production.

[8]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ruigang Yang,et al.  TofCut: Towards Robust Real-time Foreground Extraction using Time-of-flight Camera , 2016 .

[10]  Bodo Rosenhahn,et al.  Temporally Consistent Superpixels , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Ryan Crabb,et al.  Real-time foreground segmentation via range and color imaging , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Jean-Yves Guillemaut,et al.  Wide-baseline multi-view video segmentation for 3D reconstruction , 2010, 3DVP '10.

[13]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[14]  A. Criminisi,et al.  Bilayer Segmentation of Live Video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[16]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[17]  Trevor Darrell,et al.  Background estimation and removal based on range and color , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[18]  Patrick Pérez,et al.  Multi-view Object Segmentation in Space and Time , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Kentaro Toyama,et al.  Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[20]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[21]  Richard Szeliski,et al.  Multiple View Object Cosegmentation Using Appearance and Stereo Cues , 2012, ECCV.

[22]  Andrew Blake,et al.  Bi-layer segmentation of binocular stereo video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Hans-Peter Seidel,et al.  Markerless Motion Capture with unsynchronized moving cameras , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Vladimir Kolmogorov,et al.  Object cosegmentation , 2011, CVPR 2011.

[25]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Long Quan,et al.  A quasi-dense approach to surface reconstruction from uncalibrated images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Dani Lischinski,et al.  Joint bilateral upsampling , 2007, SIGGRAPH 2007.

[28]  Patrick Pérez,et al.  Sparse Multi-View Consistency for Object Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Long Quan,et al.  Silhouette Extraction from Multiple Images of Unknown Background , 2004 .

[30]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.