Visual Identification of Articulated Object Parts

As autonomous robots interact and navigate around real-world environments such as homes, it is useful to reliably identify and manipulate articulated objects, such as doors and cabinets. Many prior works in object articulation identification require manipulation of the object, either by the robot or a human. While recent works have addressed predicting articulation types from visual observations alone, they often assume prior knowledge of category-level kinematic motion models or sequence of observations where the articulated parts are moving according to their kinematic constraints. In this work, we propose training a neural network through large-scale domain randomization to identify the articulation type of object parts from a single image observation. Training data is generated via photorealistic rendering in simulation. Our proposed model predicts motion residual flows of object parts, and these residuals are used to determine the articulation type and parameters. We train the network on six object categories with 149 objects and 100K rendered images, achieving an accuracy of 82.5%. Experiments show our method generalizes to novel object categories in simulation and can be applied to real-world images without fine-tuning.

[1]  Stephen Tyree,et al.  Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Russ Tedrake,et al.  Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[3]  Oliver Brock,et al.  The RBO dataset of articulated objects and interactions , 2018, Int. J. Robotics Res..

[4]  Oliver Brock,et al.  Opening a lockbox through physical exploration , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[5]  Chris Raanes,et al.  Robotics in Medical Applications , 2004 .

[6]  Xiaogang Wang,et al.  Shape2Motion: Joint Analysis of Motion Parts and Attributes From 3D Shapes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Oliver Kroemer,et al.  Camera-to-Robot Pose Estimation from a Single Image , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Leslie Pack Kaelbling,et al.  Visual Prediction of Priors for Articulated Object Interaction , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[10]  Wolfram Burgard,et al.  Operating articulated objects based on experience , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Gaurav S. Sukhatme,et al.  Active articulation model estimation through interactive perception , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Alexander Herzog,et al.  Robot arm pose estimation through pixel-wise part classification , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Leonidas J. Guibas,et al.  SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Luc Van Gool,et al.  Deep Extreme Cut: From Extreme Points to Object Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Oliver Brock,et al.  Online interactive perception of articulated objects with multi-level recursive estimation based on task-specific priors , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Jingpei Lu,et al.  Robust Keypoint Detection and Pose Estimation of Robot Manipulators with Self-Occlusions via Sim-to-Real Transfer , 2020, ArXiv.

[17]  Stefan Holzer,et al.  Towards autonomous robotic butlers: Lessons learned with the PR2 , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  Dan O. Popa,et al.  Service robotics for the home: a state of the art review , 2014, PETRA '14.

[19]  Oliver Brock,et al.  An integrated approach to visual perception of articulated objects , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Benedetto Allotta,et al.  Robotics for medical applications , 1996, IEEE Robotics Autom. Mag..

[21]  Oliver Brock,et al.  Building Kinematic and Dynamic Models of Articulated Objects with Multi-Modal Interactive Perception , 2017, AAAI Spring Symposia.

[22]  Xiaogang Wang,et al.  Shape2Motion: Joint Analysis of Motion Parts and Attributes From 3D Shapes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Russell H. Taylor,et al.  Combating COVID-19—The role of robotics in managing public health and infectious diseases , 2020, Science Robotics.

[24]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[25]  Junku Yuh,et al.  The Status of Robotics , 2007, IEEE Robotics & Automation Magazine.

[26]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Wolfram Burgard,et al.  A Probabilistic Framework for Learning Kinematic Models of Articulated Objects , 2011, J. Artif. Intell. Res..

[29]  Feng Lu,et al.  VoxSegNet: Volumetric CNNs for Semantic Part Segmentation of 3D Shapes , 2018, IEEE Transactions on Visualization and Computer Graphics.

[30]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Oliver Brock,et al.  Interactive Perception of Articulated Objects , 2010, ISER.

[32]  Leslie Pack Kaelbling,et al.  Interactive Bayesian identification of kinematic mechanisms , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[34]  Scott Niekum,et al.  ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory , 2020, ArXiv.

[35]  A. Lynn Abbott,et al.  Category-Level Articulated Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Dieter Fox,et al.  DART: Dense Articulated Real-Time Tracking , 2014, Robotics: Science and Systems.

[37]  Oliver Brock,et al.  Interactive Perception: Leveraging Action in Perception and Perception in Action , 2016, IEEE Transactions on Robotics.

[38]  Oliver Brock,et al.  Physics-Based Selection of Informative Actions for Interactive Perception , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Stefan Schaal,et al.  Robot arm pose estimation by pixel-wise regression of joint angles , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Oliver Brock,et al.  Extracting kinematic background knowledge from interactions using task-sensitive relational learning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Stefanie Tellex,et al.  Learning to Generalize Kinematic Models to Novel Objects , 2019, CoRL.

[42]  Laurel D. Riek,et al.  Healthcare robotics , 2017, Commun. ACM.

[43]  Leonidas J. Guibas,et al.  PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.