Learning Semantic Keypoint Representations for Door Opening Manipulation

We consider a scenario where a robot is capable of autonomously opening previously unseen doors. Prior works either use model-based methods that rely strongly on accurate kinematic models, or learn a policy from scratch through trial-and-error, which cannot generalize to large variations in shape, and location of doors. In this letter, we propose a novel method for opening unseen doors with no prior knowledge of door model, which leverages semantic 3D keypoints as door handle representations to generate the end-effector trajectory from a motion planner. The keypoint representations are predicted from raw visual input by a deep neural network, which can provide a concise, and semantic description of the handle to determine the grasp pose, and subsequent motion planning. In contrast to existing works that require known object models or significant manual effort on data collection, we present a data augmentation technique to automatically generate large amounts of realistic-looking synthetic data with almost no human labeling effort. An augmented dataset, consisting of large amounts of synthetic data, and small amounts of real data, is used to train the network. Qualitative results show that our proposed method outperforms the state-of-the-art pose-based methods on real test sets in terms of perception metrics. Hardware experiments demonstrate that our proposed method can achieve 94.2% success rate on opening 6 previously unseen doors with significant shape variations under different environments, and conditions.

[1]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[2]  Michael Beetz,et al.  Laser-based perception for door and handle identification , 2009, 2009 International Conference on Advanced Robotics.

[3]  Peter Englert,et al.  Learning manipulation skills from a single demonstration , 2018, Int. J. Robotics Res..

[4]  Andrew J. Davison,et al.  RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[5]  Wei Gao,et al.  kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation , 2019, ISRR.

[6]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[7]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[8]  Advait Jain,et al.  Pulling open novel doors and drawers with equilibrium point control , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[9]  Patricio A. Vela,et al.  Learning Affordance Segmentation for Real-World Robotic Manipulation via Synthetic Images , 2019, IEEE Robotics and Automation Letters.

[10]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[11]  Keiji Nagatani,et al.  An experiment on opening-door-behavior by an autonomous mobile robot with a manipulator , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[12]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Yu Zhu,et al.  GRU-Type LARC Strategy for Precision Motion Control With Accurate Tracking Error Prediction , 2021, IEEE Transactions on Industrial Electronics.

[14]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Mohammed Bennamoun,et al.  Keypoint Detection and Local Feature Matching for Textured 3D Face Recognition , 2007, International Journal of Computer Vision.

[16]  Peter Englert,et al.  Kinematic Morphing Networks for Manipulation Skill Transfer , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[18]  Yu Zhu,et al.  Deep GRU Neural Network Prediction and Feedforward Compensation for Precision Multiaxis Motion Control Systems , 2020, IEEE/ASME Transactions on Mechatronics.

[19]  Oliver Brock,et al.  Online interactive perception of articulated objects with multi-level recursive estimation based on task-specific priors , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Petter Ögren,et al.  An Adaptive Control Approach for Opening Doors and Drawers Under Uncertainties , 2016, IEEE Transactions on Robotics.

[21]  Silvio Savarese,et al.  KETO: Learning Keypoint Representations for Tool Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[23]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[24]  Ken Goldberg,et al.  Learning ambidextrous robot grasping policies , 2019, Science Robotics.

[25]  Russ Tedrake,et al.  Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[26]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Yichen Wei,et al.  Integral Human Pose Regression , 2017, ECCV.

[28]  Xiangyang Ji,et al.  CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[30]  Sergey Levine,et al.  Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Wolfram Burgard,et al.  A Probabilistic Framework for Learning Kinematic Models of Articulated Objects , 2011, J. Artif. Intell. Res..

[32]  Silvio Savarese,et al.  HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators , 2019, CoRL.

[33]  Pieter Abbeel,et al.  DoorGym: A Scalable Door Opening Environment And Baseline Agent , 2019, ArXiv.

[34]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Alberto Del Bimbo,et al.  3D facial expression recognition using SIFT descriptors of automatically detected keypoints , 2011, The Visual Computer.

[36]  Li-Min Zhu,et al.  Intelligent Feedforward Compensation Motion Control of Maglev Planar Motor With Precise Reference Modification Prediction , 2021, IEEE Transactions on Industrial Electronics.