Occlusion handling using semantic segmentation and visibility-based rendering for mixed reality

Real-time occlusion handling is a major problem in outdoor mixed reality system because it requires great computational cost mainly due to the complexity of the scene. Using only segmentation, it is difficult to accurately render a virtual object occluded by complex objects such as vegetation. In this paper, we propose a novel occlusion handling method for real-time mixed reality given a monocular image and an inaccurate depth map. We modify the intensity of the overlayed CG object based on the texture of the underlying real scene using visibility-based rendering. To determine the appropriate level of visibility, we use CNN-based semantic segmentation and assign labels to the real scene based on the complexity of object boundary and texture. Then we combine the segmentation results and the foreground probability map from the depth image to solve the appropriate blending parameter for visibility-based rendering. Our results show improvement in handling occlusions for inaccurate foreground segmentation compared to existing blending-based methods.

[1]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[2]  Tetsuya Kakuta,et al.  Foreground and shadow occlusion handling for outdoor augmented reality , 2010, 2010 IEEE International Symposium on Mixed and Augmented Reality.

[3]  Yen-Lin Chen,et al.  Edge Snapping-Based Depth Enhancement for Dynamic Occlusion Handling in Augmented Reality , 2016, 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[4]  Takeshi Oishi,et al.  Reduction of contradictory partial occlusion in mixed reality by using characteristics of transparency perception , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[5]  Stefanie Zollmann,et al.  Dense depth maps from sparse models and image coherence for augmented reality , 2012, VRST '12.

[6]  A. Criminisi,et al.  Bilayer Segmentation of Live Video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Hideyuki Tamura,et al.  MR Platform: a basic body on which mixed reality applications are built , 2002, Proceedings. International Symposium on Mixed and Augmented Reality.

[8]  Tetsuya Kakuta,et al.  Detection of moving objects and cast shadows using a spherical vision camera for outdoor mixed reality , 2008, VRST '08.

[9]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[12]  Sang Uk Lee,et al.  Segment-based Foreground Object Disparity Estimation Using Zcam and Multiple-View Stereo , 2008, 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[13]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[14]  Naokazu Yokoya,et al.  Real-time composition of stereo images for video see-through augmented reality , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[15]  George Papagiannakis,et al.  Mixing virtual and real scenes in the site of ancient Pompeii: Research Articles , 2005 .

[16]  Takeshi Oishi,et al.  Visibility-based blending for real-time applications , 2014, 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[17]  Ruigang Yang,et al.  Fusion of time-of-flight depth and stereo for high accuracy depth maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[19]  Harry Shum,et al.  Background Cut , 2006, ECCV.

[20]  Xuming He,et al.  Multiclass semantic video segmentation with object-level active inference , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Richard Szeliski,et al.  Boundary Matting for View Synthesis , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[22]  David W. Jacobs,et al.  Deep hierarchical parsing for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Stefan Müller,et al.  Occlusion Matting: Realistic Occlusion Handling for Augmented Reality Applications , 2017, 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[24]  Ferran Marqués,et al.  Multiresolution Hierarchy Co-Clustering for Semantic Segmentation in Sequences with Small Variations , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Kwanghoon Sohn,et al.  3D reconstruction from stereo images for interactions between real and virtual objects , 2005, Signal Process. Image Commun..

[26]  Takeshi Oishi,et al.  Real-Time Simultaneous 3D Reconstruction and Optical Flow Estimation , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  George Papagiannakis,et al.  Mixing virtual and real scenes in the site of ancient Pompeii , 2005, Comput. Animat. Virtual Worlds.

[28]  Jian Sun,et al.  Lazy snapping , 2004, SIGGRAPH 2004.

[29]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.