论文信息 - DRACO: Weakly Supervised Dense Reconstruction And Canonicalization of Objects

DRACO: Weakly Supervised Dense Reconstruction And Canonicalization of Objects

We present DRACO, a method for Dense Reconstruction And Canonicalization of Object shape from one or more RGB images. Canonical shape reconstruction— estimating 3D object shape in a coordinate space canonicalized for scale, rotation, and translation parameters—is an emerging paradigm that holds promise for a multitude of robotic applications. Prior approaches either rely on painstakingly gathered dense 3D supervision, or produce only sparse canonical representations, limiting real-world applicability. DRACO performs dense canonicalization using only weak supervision in the form of camera poses and semantic keypoints at train time. During inference, DRACO predicts dense object-centric depth maps in a canonical coordinate-space, solely using one or more RGB images of an object. Extensive experiments on canonical shape reconstruction and pose estimation show that DRACO is competitive or superior to fully-supervised methods.

[1] L. Guibas,et al. ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] L. Guibas,et al. A functional approach to rotation equivariant non-linearities for Tensor Field Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Andrea Tagliasacchi,et al. Vector Neurons: A General Framework for SO(3)-Equivariant Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4] Geoffrey E. Hinton,et al. Canonical Capsules: Unsupervised Capsules in Canonical Pose , 2020, NeurIPS.

[5] Luciano Silva,et al. Learning to Orient Surfaces by Self-supervised Spherical CNNs , 2020, NeurIPS.

[6] Leonidas J. Guibas,et al. Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images , 2020, ECCV.

[7] Renjie Liao,et al. DSDNet: Deep Structured self-Driving Network , 2020, ECCV.

[8] L. Guibas,et al. CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations , 2020, NeurIPS.

[9] Abhinav Gupta,et al. Articulation-Aware Canonical Surface Mapping , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] N. Navab,et al. CPS: Class-level 6D Pose and Shape Estimation From Monocular Images , 2020, ArXiv.

[11] Cewu Lu,et al. KeypointNet: A Large-Scale 3D Keypoint Dataset Aggregated From Numerous Human Annotations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Kai Xu,et al. Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] A. L. Abbott,et al. Category-Level Articulated Object Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Adrien Gaidon,et al. Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Silvio Savarese,et al. 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[16] Rares Ambrus,et al. 3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Andrea Vedaldi,et al. C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18] Shubham Tulsiani,et al. Canonical Surface Mapping via Geometric Cycle Consistency , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Leonidas J. Guibas,et al. Multiview Aggregation for Learning Category-Specific Shape Reconstruction , 2019, NeurIPS.

[20] Wei Gao,et al. kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation , 2019, ISRR.

[21] Leonidas J. Guibas,et al. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Michael Milford,et al. QuadricSLAM: Dual Quadrics From Object Detections as Landmarks in Object-Oriented SLAM , 2018, IEEE Robotics and Automation Letters.

[23] Kostas Daniilidis,et al. Learning SO(3) Equivariant Representations with Spherical CNNs , 2017, International Journal of Computer Vision.

[24] Stefan Leutenegger,et al. Fusion++: Volumetric Object-Level SLAM , 2018, 2018 International Conference on 3D Vision (3DV).

[25] Max Welling,et al. 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data , 2018, NeurIPS.

[26] Russ Tedrake,et al. Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[27] K. Madhava Krishna,et al. Constructing Category-Specific Models for Monocular Object-SLAM , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28] Li Li,et al. Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds , 2018, ArXiv.

[29] K. Madhava Krishna,et al. Shape priors for real-time monocular object localization in dynamic environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30] George Konidaris,et al. Bayesian Eigenobjects: A Unified Framework for 3D Robot Perception , 2017, Robotics: Science and Systems.

[31] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Jan Kautz,et al. Loss Functions for Image Restoration With Neural Networks , 2017, IEEE Transactions on Computational Imaging.

[33] K. Madhava Krishna,et al. Reconstructing Vechicles from a Single Image: Shape Priors for Road Scene Understanding , 2016, ArXiv.

[34] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Jonathan P. How,et al. SLAM with objects using a nonparametric pose graph , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[37] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[38] Gustavo Carneiro,et al. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[41] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[42] Dieter Fox,et al. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[44] K. Madhava Krishna,et al. Servoing across object instances: Visual servoing for object category , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[45] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46] Jitendra Malik,et al. Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[48] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[49] Peter F. Sturm,et al. Pinhole Camera Model , 2014, Computer Vision, A Reference Guide.

[50] Daniel Cremers,et al. Dense visual SLAM for RGB-D cameras , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[51] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[52] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[53] Magnus Egerstedt,et al. Autonomous driving in urban environments: approaches, lessons and challenges , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[54] Michael A. Goodrich,et al. Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[55] Charles C. Kemp,et al. Challenges for robot manipulation in human environments [Grand Challenges of Robotics] , 2007, IEEE Robotics & Automation Magazine.

[56] Aaron Edsinger,et al. Robot manipulation in human environments , 2007 .

[57] Sebastian Thrun,et al. Stanley: The robot that won the DARPA Grand Challenge , 2006, J. Field Robotics.

[58] E. Torres-Jara,et al. Challenges for Robot Manipulation in Human Environments , 2006 .

[59] Richard M. Murray,et al. A Mathematical Introduction to Robotic Manipulation , 1994 .

[60] S. Umeyama,et al. Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..