DRACO: Weakly Supervised Dense Reconstruction And Canonicalization of Objects

We present DRACO, a method for Dense Reconstruction And Canonicalization of Object shape from one or more RGB images. Canonical shape reconstruction— estimating 3D object shape in a coordinate space canonicalized for scale, rotation, and translation parameters—is an emerging paradigm that holds promise for a multitude of robotic applications. Prior approaches either rely on painstakingly gathered dense 3D supervision, or produce only sparse canonical representations, limiting real-world applicability. DRACO performs dense canonicalization using only weak supervision in the form of camera poses and semantic keypoints at train time. During inference, DRACO predicts dense object-centric depth maps in a canonical coordinate-space, solely using one or more RGB images of an object. Extensive experiments on canonical shape reconstruction and pose estimation show that DRACO is competitive or superior to fully-supervised methods.

[1]  L. Guibas,et al.  ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  L. Guibas,et al.  A functional approach to rotation equivariant non-linearities for Tensor Field Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andrea Tagliasacchi,et al.  Vector Neurons: A General Framework for SO(3)-Equivariant Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Geoffrey E. Hinton,et al.  Canonical Capsules: Unsupervised Capsules in Canonical Pose , 2020, NeurIPS.

[5]  Luciano Silva,et al.  Learning to Orient Surfaces by Self-supervised Spherical CNNs , 2020, NeurIPS.

[6]  Leonidas J. Guibas,et al.  Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images , 2020, ECCV.

[7]  Renjie Liao,et al.  DSDNet: Deep Structured self-Driving Network , 2020, ECCV.

[8]  L. Guibas,et al.  CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations , 2020, NeurIPS.

[9]  Abhinav Gupta,et al.  Articulation-Aware Canonical Surface Mapping , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  N. Navab,et al.  CPS: Class-level 6D Pose and Shape Estimation From Monocular Images , 2020, ArXiv.

[11]  Cewu Lu,et al.  KeypointNet: A Large-Scale 3D Keypoint Dataset Aggregated From Numerous Human Annotations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kai Xu,et al.  Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  A. L. Abbott,et al.  Category-Level Articulated Object Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Adrien Gaidon,et al.  Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Silvio Savarese,et al.  6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Rares Ambrus,et al.  3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Andrea Vedaldi,et al.  C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Shubham Tulsiani,et al.  Canonical Surface Mapping via Geometric Cycle Consistency , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Leonidas J. Guibas,et al.  Multiview Aggregation for Learning Category-Specific Shape Reconstruction , 2019, NeurIPS.

[20]  Wei Gao,et al.  kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation , 2019, ISRR.

[21]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Michael Milford,et al.  QuadricSLAM: Dual Quadrics From Object Detections as Landmarks in Object-Oriented SLAM , 2018, IEEE Robotics and Automation Letters.

[23]  Kostas Daniilidis,et al.  Learning SO(3) Equivariant Representations with Spherical CNNs , 2017, International Journal of Computer Vision.

[24]  Stefan Leutenegger,et al.  Fusion++: Volumetric Object-Level SLAM , 2018, 2018 International Conference on 3D Vision (3DV).

[25]  Max Welling,et al.  3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data , 2018, NeurIPS.

[26]  Russ Tedrake,et al.  Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[27]  K. Madhava Krishna,et al.  Constructing Category-Specific Models for Monocular Object-SLAM , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Li Li,et al.  Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds , 2018, ArXiv.

[29]  K. Madhava Krishna,et al.  Shape priors for real-time monocular object localization in dynamic environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  George Konidaris,et al.  Bayesian Eigenobjects: A Unified Framework for 3D Robot Perception , 2017, Robotics: Science and Systems.

[31]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jan Kautz,et al.  Loss Functions for Image Restoration With Neural Networks , 2017, IEEE Transactions on Computational Imaging.

[33]  K. Madhava Krishna,et al.  Reconstructing Vechicles from a Single Image: Shape Priors for Road Scene Understanding , 2016, ArXiv.

[34]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jonathan P. How,et al.  SLAM with objects using a nonparametric pose graph , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[37]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[38]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[41]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[42]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[44]  K. Madhava Krishna,et al.  Servoing across object instances: Visual servoing for object category , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46]  Jitendra Malik,et al.  Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[48]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[49]  Peter F. Sturm,et al.  Pinhole Camera Model , 2014, Computer Vision, A Reference Guide.

[50]  Daniel Cremers,et al.  Dense visual SLAM for RGB-D cameras , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[51]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[52]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[53]  Magnus Egerstedt,et al.  Autonomous driving in urban environments: approaches, lessons and challenges , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[54]  Michael A. Goodrich,et al.  Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[55]  Charles C. Kemp,et al.  Challenges for robot manipulation in human environments [Grand Challenges of Robotics] , 2007, IEEE Robotics & Automation Magazine.

[56]  Aaron Edsinger,et al.  Robot manipulation in human environments , 2007 .

[57]  Sebastian Thrun,et al.  Stanley: The robot that won the DARPA Grand Challenge , 2006, J. Field Robotics.

[58]  E. Torres-Jara,et al.  Challenges for Robot Manipulation in Human Environments , 2006 .

[59]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[60]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..