Localization and Completion for 3D Object Interactions

Finding where and what objects to put into an existing scene is a common task for scene synthesis and robot/character motion planning. Existing frameworks require development of hand-crafted features suitable for the task, or full volumetric analysis that could be memory intensive and imprecise. In this paper, we propose a data-driven framework to discover a suitable location and then place the appropriate objects in a scene. Our approach is inspired by computer vision techniques for localizing objects in images: using an all directional depth image (ADD-image) that encodes the 360-degree field of view from samples in the scene, our system regresses the images to the positions where the new object can be located. Given several candidate areas around the host object in the scene, our system predicts the partner object whose geometry fits well to the host object. Our approach is highly parallel and memory efficient, and is especially suitable for handling interactions between large and small objects. We show examples where the system can hang bags on hooks, fit chairs in front of desks, put objects into shelves, insert flowers into vases, and put hangers onto laundry rack.

[1]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Kang Chen,et al.  Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information , 2014, ACM Trans. Graph..

[3]  Taku Komura,et al.  Indexing 3D Scenes Using the Interaction Bisector Surface , 2014, ACM Trans. Graph..

[4]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[5]  Matthias Nießner,et al.  Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Leonidas J. Guibas,et al.  Understanding and Exploiting Object Interaction Landscapes , 2016, ACM Trans. Graph..

[7]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[8]  Ligang Liu,et al.  Interaction context (ICON) , 2015, ACM Trans. Graph..

[9]  Taku Komura,et al.  Relationship templates for creating scene variations , 2016, ACM Trans. Graph..

[10]  Ariel Shamir,et al.  Predictive and generative neural networks for object functionality , 2018, ACM Trans. Graph..

[11]  Chenfanfu Jiang,et al.  Human-Centric Indoor Scene Synthesis Using Stochastic Grammar , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Chi-Keung Tang,et al.  Make it home: automatic optimization of furniture arrangement , 2011, ACM Trans. Graph..

[13]  Yehezkel Lamdan,et al.  Geometric Hashing: A General And Efficient Model-based Recognition Scheme , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[14]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Pat Hanrahan,et al.  Example-based synthesis of 3D object arrangements , 2012, ACM Trans. Graph..

[17]  Pat Hanrahan,et al.  Characterizing structural relationships in scenes using graph kernels , 2011, ACM Trans. Graph..

[18]  Angel X. Chang,et al.  Deep convolutional priors for indoor scene synthesis , 2018, ACM Trans. Graph..

[19]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Zhen Li,et al.  High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Bin Zhou,et al.  Adaptive synthesis of indoor scenes via activity-associated object relation graphs , 2017, ACM Trans. Graph..

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ariel Shamir,et al.  Filling Your Shelves: Synthesizing Diverse Style-Preserving Artifact Arrangements , 2014, IEEE Transactions on Visualization and Computer Graphics.

[24]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[25]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.