Robotic Table Tennis with Model-Free Reinforcement Learning

We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz. We demonstrate that evolutionary search (ES) methods acting on CNN-based policy architectures for non-visual inputs and convolving across time learn compact controllers leading to smooth motions. Furthermore, we show that with appropriately tuned curriculum learning on the task and rewards, policies are capable of developing multi-modal styles, specifically forehand and backhand stroke, whilst achieving 80\% return rate on a wide range of ball throws. We observe that multi-modality does not require any architectural priors, such as multi-head architectures or hierarchical policies.

[1]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[2]  Jan Peters,et al.  Model-Free Trajectory-based Policy Optimization with Monotonic Improvement , 2016, J. Mach. Learn. Res..

[3]  H. Hashimoto Development of Ping-Pong Robot System Using 7 Degree of Freedom Direct Drive Robots , 1987 .

[4]  Fumio Miyazaki,et al.  A learning approach to robotic table tennis , 2005, IEEE Transactions on Robotics.

[5]  Risto Miikkulainen,et al.  Hierarchical Policy Design for Sample-Efficient Learning of Robot Table Tennis Through Self-Play , 2018, ArXiv.

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  Fumio Miyazaki,et al.  Learning to Dynamically Manipulate: A Table Tennis Robot Controls a Ball and Rallies with a Human Being , 2006 .

[8]  John Knight,et al.  Pingpong-playing robot controlled by a microcomputer , 1986, Microprocess. Microsystems.

[9]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[10]  Bernhard Schölkopf,et al.  Learning optimal striking points for a ping-pong playing robot , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Yongsheng Zhao,et al.  Towards High Level Skill Learning: Learn to Return Table Tennis Ball Using Monte-Carlo Based Policy Gradient Method , 2018, 2018 IEEE International Conference on Real-time Computing and Robotics (RCAR).

[12]  Jan Peters,et al.  Online optimal trajectory generation for robot table tennis , 2018, Robotics Auton. Syst..

[13]  Sham M. Kakade,et al.  Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[14]  Wenbo Gao,et al.  ES-MAML: Simple Hessian-Free Meta Learning , 2020, ICLR.

[15]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[16]  Fumio Miyazaki,et al.  Learning to the robot table tennis task-ball control & rally with a human , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[17]  Russell L. Anderson,et al.  A Robot Ping-Pong Player: Experiments in Real-Time Intelligent Control , 1988 .

[18]  Fumio Miyazaki,et al.  Realization of the table tennis task based on virtual targets , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[19]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[20]  Krzysztof Choromanski,et al.  From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization , 2019, NeurIPS.

[21]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[22]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[23]  Andreas Zell,et al.  A Table Tennis Robot System Using an Industrial KUKA Robot Arm , 2018, GCPR.

[24]  Jan Peters,et al.  A Computational Model of Human Table Tennis for Robot Application , 2009, AMS.

[25]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[26]  Atil Iscen,et al.  Provably Robust Blackbox Optimization for Reinforcement Learning , 2019, CoRL.

[27]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Richard E. Turner,et al.  Structured Evolution with Compact Architectures for Scalable Policy Optimization , 2018, ICML.

[30]  Jan Peters,et al.  A biomimetic approach to robot table tennis , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Jan Peters,et al.  Simulating Human Table Tennis with a Biomimetic Robot Setup , 2010, SAB.

[32]  Wojciech Zaremba,et al.  Domain Randomization and Generative Models for Robotic Grasping , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[34]  Rong Xiong,et al.  Balance motion generation for a humanoid robot playing table tennis , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[35]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[36]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[37]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[38]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[39]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Andreas Zell,et al.  Markerless Racket Pose Detection and Stroke Classification Based on Stereo Vision for Table Tennis Robots , 2019, 2019 Third IEEE International Conference on Robotic Computing (IRC).

[41]  Jan Peters,et al.  Learning table tennis with a Mixture of Motor Primitives , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[42]  Russell L. Andersson Robot Ping-Pong , 2003 .

[43]  Bernhard Schölkopf,et al.  Jointly learning trajectory generation and hitting point prediction in robot table tennis , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[44]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).