Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning

Rearranging objects on a tabletop surface by means of nonprehensile manipulation is a task which requires skillful interaction with the physical world. Usually, this is achieved by precisely modeling physical properties of the objects, robot, and the environment for explicit planning. In contrast, as explicitly modeling the physical environment is not always feasible and involves various uncertainties, we learn a nonprehensile rearrangement strategy with deep reinforcement learning based on only visual feedback. For this, we model the task with rewards and train a deep Q-network. Our potential field-based heuristic exploration strategy reduces the amount of collisions which lead to suboptimal outcomes and we actively balance the training set to avoid bias towards poor examples. Our training process leads to quicker learning and better performance on the task as compared to uniform exploration and standard experience replay. We demonstrate empirical evidence from simulation that our method leads to a success rate of 85%, show that our system can cope with sudden changes of the environment, and compare our performance with human level performance.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[3]  Aaron M. Johnson,et al.  A Probabilistic Planning Framework for Planar Grasping Under Uncertainty , 2017, IEEE Robotics and Automation Letters.

[4]  Danica Kragic,et al.  Combinatorial optimization for hierarchical contact-level grasping , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Jun Morimoto,et al.  Integrating visual perception and manipulation for autonomous learning of object representations , 2013, Adapt. Behav..

[6]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[7]  James J. Kuffner,et al.  Navigation among movable obstacles: real-time reasoning in complex environments , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[8]  Siddhartha S. Srinivasa,et al.  Unobservable Monte Carlo planning for nonprehensile rearrangement tasks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Danica Kragic,et al.  Herding by Caging: a Topological Approach towards Guiding Moving Agents via Mobile Robots , 2017, Robotics: Science and Systems.

[10]  Danica Kragic,et al.  Hierarchical Fingertip Space: A Unified Framework for Grasp Planning and In-Hand Grasp Adaptation , 2016, IEEE Transactions on Robotics.

[11]  Danica Kragic,et al.  Hierarchical Fingertip Space for multi-fingered precision grasping , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Siddhartha S. Srinivasa,et al.  Kinodynamic randomized rearrangement planning via dynamic transitions between statically stable states , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Danica Kragic,et al.  A Framework for Optimal Grasp Contact Planning , 2017, IEEE Robotics and Automation Letters.

[14]  Gordon T. Wilfong Motion planning in the presence of movable obstacles , 1988, SCG '88.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Robert Babuska,et al.  Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[17]  Nima Fazeli,et al.  Identifiability Analysis of Planar Rigid-Body Frictional Contact , 2017, ISRR.

[18]  Thierry Siméon,et al.  Manipulation Planning with Probabilistic Roadmaps , 2004, Int. J. Robotics Res..

[19]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[20]  Siddhartha S. Srinivasa,et al.  Rearrangement planning using object-centric and robot-centric action spaces , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Siddhartha S. Srinivasa,et al.  Nonprehensile whole arm rearrangement planning on physics manifolds , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[23]  Tamim Asfour,et al.  Manipulation Planning Among Movable Obstacles , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[24]  Qidan Zhu,et al.  Robot Path Planning Based on Artificial Potential Field Approach with Simulated Annealing , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[25]  J. O’Neill,et al.  Play it again: reactivation of waking experience and memory , 2010, Trends in Neurosciences.

[26]  Howie Choset,et al.  Principles of Robot Motion: Theory, Algorithms, and Implementation ERRATA!!!! 1 , 2007 .

[27]  Gaurav S. Sukhatme,et al.  Using manipulation primitives for brick sorting in clutter , 2012, 2012 IEEE International Conference on Robotics and Automation.

[28]  Siddhartha S. Srinivasa,et al.  A Framework for Push-Grasping in Clutter , 2011, Robotics: Science and Systems.

[29]  Akansel Cosgun,et al.  Push planning for object placement on cluttered table surfaces , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[31]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[32]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[33]  Siddhartha S. Srinivasa,et al.  Physics-Based Grasp Planning Through Clutter , 2012, Robotics: Science and Systems.