论文信息 - Explicit Occlusion Reasoning for 3D Object Detection

Explicit Occlusion Reasoning for 3D Object Detection

Consider the problem of recognizing an object that is partially occluded in an image. The visible portions are likely to match learned appearance models for the object, but hidden portions will not. The (hypothetical) ideal system would consider only the visible object information, correctly ignoring all occluded regions. In purely 2D recognition, this requires inferring the occlusion present, which is a significant challenge since the number of possible occlusion masks is, in principle, exponential. We simplify the problem, considering only a small subset of the most likely occlusions (top, bottom, left, and right halves) and noting that some mismatch is tolerable. We train partial-object detectors tailored exactly to each of these few cases. In addition, we reason about objects in 3D and incorporate sensed geometry, as from an RGB-depth camera, along with visual imagery. This allows explicit occlusion masks to be constructed for each object hypothesis. The masks specify how much to trust each partial template, based on their overlap with visible object regions. Only the visible evidence contributes to our object reasoning.

James J. Little | Bernt Schiele | Christian Wojek | David Meger

[1] Silvio Savarese,et al. Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery , 2010, ECCV.

[2] Quoc V. Le,et al. High-accuracy 3D sensing for mobile manipulation: Improving object detection and door opening , 2009, 2009 IEEE International Conference on Robotics and Automation.

[3] Trevor Darrell,et al. Size Matters: Metric Visual Search Constraints from Monocular Metadata , 2010, NIPS.

[4] Charless C. Fowlkes,et al. Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5] Rich Caruana,et al. Obtaining Calibrated Probabilities from Boosting , 2005, UAI.

[6] Bernt Schiele,et al. Monocular 3D scene understanding with explicit occlusion reasoning , 2011, CVPR 2011.

[7] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Dieter Fox,et al. Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.

[9] Luc Van Gool,et al. Robust tracking-by-detection using a detector confidence particle filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10] James J. Little,et al. Multiple Viewpoint Recognition and Localization , 2010, ACCV.

[11] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[12] Mark Fiala,et al. ARTag, a fiducial marker system using digital techniques , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13] Christopher Hunt,et al. Notes on the OpenSURF Library , 2009 .

[14] Dariu Gavrila,et al. Multi-cue pedestrian classification with partial occlusion handling , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15] Bernt Schiele,et al. Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes , 2010, ECCV.

[16] Shuicheng Yan,et al. An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17] James J. Little,et al. Mobile 3D object detection in clutter , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18] Cyrill Stachniss,et al. Hierarchical optimization on manifolds for online 2D and 3D mapping , 2010, 2010 IEEE International Conference on Robotics and Automation.

[19] Luc Van Gool,et al. Object Detection and Tracking for Autonomous Navigation in Dynamic Environments , 2010, Int. J. Robotics Res..

[20] Wolfram Burgard,et al. A real-time expectation-maximization algorithm for acquiring multiplanar maps of indoor environments with mobile robots , 2004, IEEE Transactions on Robotics and Automation.