Multi-view foreground segmentation based on spatial consistency constraints

Recent researches on object segmentation mostly concentrate on single-view images or objects in 3D settings. In this paper, a novel method for efficient multi-view foreground object segmentation is presented, using spatial consistency across adjacent views as constraints to generate identical masks. Even though the conventional segmentation results at different views are relatively accurate, there always are inconsistent regions where the boundaries of the mask are different over the same area across multiple views. The central idea of our method is to utilize the camera parameters to guide the refocusing procedure, during which each instance across different views is refocused using multi-view projections. The refocused images are then used as the input of instance segmentation network to predict the bounding box and object mask. The final step takes the network output as the prior information for the GMMs to achieve more accurate segmentation results. While many concrete implementations of the general idea are feasible, satisfactory results can be achieved with this simple and efficient approach. Experimental results demonstrate both qualitatively and quantitatively that the proposed method outputs excellent results with less background pixels, thus allowing us to improve the 3D display quality eventually. We hope this simple and effective method can be of help to future researches in relevant tasks.

[1]  Yongchao Gong,et al.  Mask Scoring R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yong Jae Lee,et al.  YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[4]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[5]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[7]  Wide-Baseline MultiView Video Segmentation For 3 D Reconstruction , 2010 .

[8]  Younghui Kim,et al.  Object Segmentation Ensuring Consistency Across Multi-Viewpoint Images , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kai Chen,et al.  Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yi Li,et al.  Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Richard Szeliski,et al.  Multiple View Object Cosegmentation Using Appearance and Stereo Cues , 2012, ECCV.

[12]  Shujun Xing,et al.  Full-parallax 3D light field display with uniform view density along the horizontal and vertical direction , 2020 .

[13]  Jean-Yves Guillemaut,et al.  Wide-baseline multi-view video segmentation for 3D reconstruction , 2010, 3DVP '10.

[14]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Patrick Pérez,et al.  Sparse Multi-View Consistency for Object Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.