Toward Self-Driving Bicycles Using State-of-the-Art Deep Reinforcement Learning Algorithms

In this paper, we propose a controller for a bicycle using the DDPG (Deep Deterministic Policy Gradient) algorithm, which is a state-of-the-art deep reinforcement learning algorithm. We use a reward function and a deep neural network to build the controller. By using the proposed controller, a bicycle can not only be stably balanced but also travel to any specified location. We confirm that the controller with DDPG shows better performance than the other baselines such as Normalized Advantage Function (NAF) and Proximal Policy Optimization (PPO). For the performance evaluation, we implemented the proposed algorithm in various settings such as fixed and random speed, start location, and destination location.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[4]  TaeChoong Chung,et al.  Learning a Self-driving Bicycle Using Deep Deterministic Policy Gradient , 2018, 2018 18th International Conference on Control, Automation and Systems (ICCAS).

[5]  Arend L. Schwab,et al.  Some recent developments in bicycle dynamics , 2007 .

[6]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[7]  Xuejun Li,et al.  Deep reinforcement learning policy in Hex game system , 2018, 2018 Chinese Control And Decision Conference (CCDC).

[8]  C. Karen Liu,et al.  Learning bicycle stunts , 2014, ACM Trans. Graph..

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Kazuhiro Ohkura,et al.  Collective Behavior Acquisition of Real Robotic Swarms Using Deep Reinforcement Learning , 2018, 2018 Second IEEE International Conference on Robotic Computing (IRC).

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  TaeChoong Chung,et al.  Controlling bicycle using deep deterministic policy gradient algorithm , 2017, 2017 14th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI).

[13]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[14]  Masaki Yamakita,et al.  Controlling balancer and steering for bicycle stabilization , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[16]  G. Uhlenbeck,et al.  On the Theory of the Brownian Motion , 1930 .

[17]  Eduardo Bejar,et al.  Deep reinforcement learning based neuro-control for a two-dimensional magnetic positioning system , 2018, 2018 4th International Conference on Control, Automation and Robotics (ICCAR).

[18]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[19]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[20]  Arend L. Schwab,et al.  Linearized dynamics equations for the balance and steer of a bicycle: a benchmark and review , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[21]  Chih-Lyang Hwang,et al.  Fuzzy Sliding-Mode Underactuated Control for Autonomous Dynamic Balance of an Electrical Bicycle , 2009, IEEE Transactions on Control Systems Technology.