Controlling bicycle using deep deterministic policy gradient algorithm

Controlling a bicycle without human interaction is still a challenge for researchers. Most of the studies on this topic focus on the physical area of bicycle or designing controllers based on automatic control knowledge such as feedback controller, LQR controller. This study focuses on applying a state-of-the-art deep reinforcement learning algorithm called Deep Deterministic Policy Gradient to control the bicycle. The bicycle can use the learned controller (agent) to keep balancing or reach a specified goal.

[1]  D. Herlihy Bicycle: The History , 2004 .

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Giacomo Innocenti,et al.  Lego‐bike: A challenging robotic lab project to illustrate rapid prototyping in the mindstorms/simulink integrated platform , 2015, Comput. Appl. Eng. Educ..

[6]  Mike Goslin,et al.  The Panda 3 D Graphics Engine , 2022 .

[7]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[8]  Arend L. Schwab,et al.  Linearized dynamics equations for the balance and steer of a bicycle: a benchmark and review , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[9]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  R. Mazo On the theory of brownian motion , 1973 .

[12]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[13]  Arend L. Schwab,et al.  Some recent developments in bicycle dynamics , 2007 .

[14]  Giacomo Innocenti,et al.  Simulink meets Lego: Rapid controller prototyping of a stabilized bicycle model , 2013, 52nd IEEE Conference on Decision and Control.

[15]  Michail G. Lagoudakis,et al.  Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[16]  Mark R. Mine,et al.  The Panda3D Graphics Engine , 2004, Computer.