论文信息 - Learning 3D Part Assembly from a Single Image

Learning 3D Part Assembly from a Single Image

Autonomous assembly is a crucial capability for robots in many applications. For this task, several problems such as obstacle avoidance, motion planning, and actuator control have been extensively studied in robotics. However, when it comes to task specification, the space of possibilities remains underexplored. Towards this end, we introduce a novel problem, single-image-guided 3D part assembly, along with a learningbased solution. We study this problem in the setting of furniture assembly from a given complete set of parts and a single image depicting the entire assembled object. Multiple challenges exist in this setting, including handling ambiguity among parts (e.g., slats in a chair back and leg stretchers) and 3D pose prediction for parts and part subassemblies, whether visible or occluded. We address these issues by proposing a two-module pipeline that leverages strong 2D-3D correspondences and assembly-oriented graph message-passing to infer part relationships. In experiments with a PartNet-based synthetic benchmark, we demonstrate the effectiveness of our framework as compared with three baseline approaches.

[1] Leonidas J. Guibas,et al. Learning Fuzzy Set Representations of Partial Shapes on Dual Embedding Spaces , 2018, Comput. Graph. Forum.

[2] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3] Siddhartha Chaudhuri,et al. A probabilistic model for component-based shape synthesis , 2012, ACM Trans. Graph..

[4] Leonidas J. Guibas,et al. ComplementMe , 2017, ACM Trans. Graph..

[5] Gregory Levitin,et al. A genetic algorithm for robotic assembly line balancing , 2006, Eur. J. Oper. Res..

[6] Matthias Nießner,et al. 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Avinash C. Kak,et al. Extending the classical AI planning paradigm to robotic assembly planning , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[8] Armin Biess,et al. Learning Pose Estimation for High-Precision Robotic Assembly Using Simulated Depth Images , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[9] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[10] Leonidas J. Guibas,et al. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Daniel Cohen-Or,et al. CompoNet: Learning to Generate the Unseen by Part Synthesis and Composition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Pascal Fua,et al. Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Nikolaos Papanikolopoulos,et al. Visual Servoing for Robotic Assembly , 1993 .

[14] Raquel Urtasun,et al. DeepRoadMapper: Extracting Road Topology from Aerial Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15] Xian Zhou,et al. Can robots assemble an IKEA chair? , 2018, Science Robotics.

[16] Dieter Fox,et al. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[17] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Randall H. Wilson,et al. The Archimedes 2 mechanical assembly planning system , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[19] Zoltan Kato,et al. Realigning 2D and 3D Object Fragments without Correspondences , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Thomas Brox,et al. Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Leonidas J. Guibas,et al. StructureNet , 2019, ACM Trans. Graph..

[22] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Tae-Kyun Kim,et al. Latent-Class Hough Forests for 6 DoF Object Pose Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[25] Markus Braun,et al. Pose-RCNN: Joint object detection and pose estimation using 3D object proposals , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[26] Jitendra Malik,et al. Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[28] Lin Gao,et al. SDM-NET , 2019, ACM Trans. Graph..

[29] Mathieu Aubry,et al. A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Daniel Cremers,et al. Non‐Rigid Puzzles , 2016, Comput. Graph. Forum.

[31] Yong Xiao,et al. Vision guided autonomous robotic assembly and as-built scanning on unstructured construction sites , 2015 .

[32] Kai Xu,et al. Learning Part Generation and Assembly for Structure-aware Shape Synthesis , 2019, AAAI.

[33] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[35] Yang Liu,et al. Adaptive O-CNN , 2018, ACM Trans. Graph..

[36] Eric Brachmann,et al. Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[37] Alexey Dosovitskiy,et al. Unsupervised Learning of Shape and Pose with Differentiable Point Clouds , 2018, NeurIPS.

[38] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Leonidas J. Guibas,et al. StructEdit: Learning Structural Shape Variations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Prakhar Jaiswal,et al. Assembly-based conceptual 3D modeling with unlabeled components using probabilistic factor graph , 2016, Comput. Aided Des..

[41] Eric Brachmann,et al. Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Matthias Nießner,et al. RIO: 3D Object Instance Re-Localization in Changing Indoor Environments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[44] Gabriele M. T. D'Eleuterio,et al. Neural network-based pose estimation for fixtureless assembly , 2001, Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No.01EX515).

[45] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Jun Li,et al. Im2Struct: Recovering 3D Shape Structure from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47] Vincent Lepetit,et al. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48] Chen Kong,et al. Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction , 2017, AAAI.

[49] Jun Li,et al. Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Leonidas J. Guibas,et al. PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Duygu Ceylan,et al. DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[52] Galina Okouneva,et al. Stereo vision algorithm for robotic assembly operations , 2004, First Canadian Conference on Computer and Robot Vision, 2004. Proceedings..

[53] Leonidas J. Guibas,et al. GRASS: Generative Recursive Autoencoders for Shape Structures , 2017, ACM Trans. Graph..

[54] Alexander M. Bronstein,et al. Putting the Pieces Together: Regularized Multi-part Shape Matching , 2012, ECCV Workshops.

[55] Pablo Jiménez,et al. Survey on assembly sequencing: a combinatorial and geometrical perspective , 2013, J. Intell. Manuf..

[56] Yue Wang,et al. Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[57] Jeannette Bohg,et al. Learning to Scaffold the Development of Robotic Manipulation Skills , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[58] Dani Lischinski,et al. SAGNet , 2018, ACM Trans. Graph..

[59] Hao Zhang,et al. Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60] Daniel Cohen-Or,et al. Global-to-local generative model for 3D shapes , 2018, ACM Trans. Graph..

[61] Leonidas J. Guibas,et al. Composite Shape Modeling via Latent Space Factorization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[62] Kuan-Ting Yu,et al. Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[63] Lihui Wang,et al. Robotic assembly planning and control with enhanced adaptability through function blocks , 2014, The International Journal of Advanced Manufacturing Technology.

[64] Leonidas J. Guibas,et al. Probabilistic reasoning for assembly-based 3D modeling , 2011, ACM Trans. Graph..

[65] Stefan Roth,et al. Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66] Siddhartha Chaudhuri,et al. Data-driven suggestions for creativity support in 3D modeling , 2010, ACM Trans. Graph..

[67] Wei Liu,et al. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[68] Nassir Navab,et al. SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[69] Markus Schoeler,et al. Semantic Pose Using Deep Networks Trained on Synthetic RGB-D , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[70] Mathieu Aubry,et al. AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation , 2018, CVPR 2018.

[71] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Jure Leskovec,et al. How Powerful are Graph Neural Networks? , 2018, ICLR.

[73] Avinash C. Kak,et al. Real-time tracking and pose estimation for industrial objects using geometric features , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[74] Hao Zhang,et al. PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75] Ming-Yu Liu,et al. Voting-based pose estimation for robotic assembly using a 3D sensor , 2012, 2012 IEEE International Conference on Robotics and Automation.

[76] Yinda Zhang,et al. Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).