Occlusion Boundary: A Formal Definition & Its Detection via Deep Exploration of Context

Occlusion boundaries contain rich perceptual information about the underlying scene structure and provide important cues in many visual perception-related tasks such as object recognition, segmentation, motion estimation, scene understanding, and autonomous navigation. However, there is no formal definition of occlusion boundaries in the literature, and state-of-the-art occlusion boundary detection is still suboptimal. With this in mind, in this paper we propose a formal definition of occlusion boundaries for related studies. Further, based on a novel idea, we develop two concrete approaches with different characteristics to detect occlusion boundaries in video sequences via enhanced exploration of contextual information (e.g., local structural boundary patterns, observations from surrounding regions, and temporal context) with deep models and conditional random fields. Experimental evaluations of our methods on two challenging occlusion boundary benchmarks (CMU and VSB100) demonstrate that our detectors significantly outperform the current state-of-the-art. Finally, we empirically assess the roles of several important components of the proposed detectors to validate the rationale behind these approaches.

[1]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  M. Opper,et al.  Comparing the Mean Field Method and Belief Propagation for Approximate Inference in MRFs , 2001 .

[3]  Thomas Brox,et al.  Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation , 2018, ECCV.

[4]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[5]  Daphna Weinshall,et al.  Motion Segmentation and Depth Ordering Using an Occlusion Detector , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Alan L. Yuille,et al.  Occlusion Boundary Detection Using Pseudo-depth , 2010, ECCV.

[8]  Thomas Brox,et al.  Point-Wise Mutual Information-Based Video Segmentation with High Temporal Consistency , 2016, ECCV Workshops.

[9]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  David J. Fleet,et al.  Probabilistic Detection and Tracking of Motion Boundaries , 2000, International Journal of Computer Vision.

[11]  Seth J. Teller,et al.  Particle Video: Long-Range Motion Estimation Using Point Trajectories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  David V. Anderson,et al.  Finding Temporally Consistent Occlusion Boundaries in Videos Using Geometric Context , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[13]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[14]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[15]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[16]  B. S. Manjunath,et al.  Probabilistic occlusion boundary detection on spatio-temporal lattices , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Stefano Soatto,et al.  Detachable Object Detection: Segmentation and Depth Ordering from Short-Baseline Video , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[19]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[20]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Zhuowen Tu,et al.  Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Michael J. Black,et al.  Occlusion Boundary Detection via Deep Exploration of Context , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Martial Hebert,et al.  Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning , 2009, International Journal of Computer Vision.

[25]  Abhinav Gupta,et al.  Designing deep networks for surface normal estimation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Cuneyt Akinlar,et al.  An Occlusion-Resistant Ellipse Detection Method by Joining Coelliptic Arcs , 2016, ECCV.

[27]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[28]  Sinisa Todorovic,et al.  Boundary Flow: A Siamese Network that Predicts Boundary Motion Without Training on Motion , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Pushmeet Kohli,et al.  Markov Random Fields for Vision and Image Processing , 2011 .

[30]  Thomas Brox,et al.  A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Alexei A. Efros,et al.  Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships , 2009, NIPS.

[32]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Heesoo Myeong,et al.  Learning object relationships via graph-based context model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Yu Liu,et al.  Learning Relaxed Deep Supervision for Better Edge Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  C. Lawrence Zitnick,et al.  Fast Edge Detection Using Structured Forests , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Gabriel J. Brostow,et al.  Learning to find occlusion regions , 2011, CVPR 2011.

[38]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from a Single Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[39]  Andrew W. Fitzgibbon,et al.  Learning spatiotemporal T-junctions for occlusion detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[40]  Irving Biederman,et al.  On the Semantics of a Glance at a Scene , 2017 .

[41]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[42]  Jitendra Malik,et al.  Occlusion boundary detection and figure/ground assignment from optical flow , 2011, CVPR 2011.

[43]  Cordelia Schmid,et al.  Learning to detect Motion Boundaries , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Cristian Sminchisescu,et al.  Generalized Boundaries from Multiple Image Interpretations , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Alan L. Yuille,et al.  DOC: Deep OCclusion Estimation from a Single Image , 2015, ECCV.

[47]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[48]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Kurt Keutzer,et al.  Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow , 2010, ECCV.

[50]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[51]  Nikos Komodakis,et al.  Markov Random Field modeling, inference & learning in computer vision & image understanding: A survey , 2013, Comput. Vis. Image Underst..

[52]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[53]  Jean Ponce,et al.  Learning a convolutional neural network for non-uniform motion blur removal , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[55]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[56]  Michael J. Black Combining Intensity and Motion for Incremental Segmentation and Tracking Over Long Image Sequences , 1992, ECCV.

[57]  Martial Hebert,et al.  Local detection of occlusion boundaries in video , 2009, Image Vis. Comput..

[58]  Victor S. Lempitsky,et al.  N4-Fields: Neural Network Nearest Neighbor Fields for Image Transforms , 2014, ArXiv.

[59]  Truong Q. Nguyen,et al.  An Online Learning Approach to Occlusion Boundary Detection , 2012, IEEE Transactions on Image Processing.

[60]  Peter Kontschieder,et al.  Structured class-labels in random forests for semantic image labelling , 2011, 2011 International Conference on Computer Vision.

[61]  Xiang Bai,et al.  Richer Convolutional Features for Edge Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Michael J. Black,et al.  Constraints for the Early Detection of Discontinuity from Motion , 1990, AAAI.

[63]  Yan Wang,et al.  DeepContour: A deep convolutional feature learned by positive-sharing loss for contour detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Xiaohui Liang,et al.  DOOBNet: Deep Object Occlusion Boundary Detection from an Image , 2018, ACCV.

[65]  Brian Taylor,et al.  Causal video object segmentation from persistence of occlusions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Shiguang Shan,et al.  Shape driven kernel adaptation in Convolutional Neural Network for robust facial trait recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Kang Zheng,et al.  Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[69]  Seungkyu Lee,et al.  Bi-layer inpainting for novel view synthesis , 2011, 2011 18th IEEE International Conference on Image Processing.

[70]  Michael J. Black,et al.  A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles Behind Them , 2013, International Journal of Computer Vision.

[71]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[73]  Philippe Salembier,et al.  Monocular Depth Ordering Using T-Junctions and Convexity Occlusion Cues , 2013, IEEE Transactions on Image Processing.

[74]  Hanumant Singh,et al.  Camouflaging an Object from Many Viewpoints , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.