Continuity and Smoothness Analysis and Possible Improvement of Traditional Reinforcement Learning Methods

At present, the deep reinforcement learning method has become one of the important branches in the field of artificial intelligence. Its model-free feature makes it considered as one of the ways to achieve the goal of strong artificial intelligence. In addition to discrete decision-making tasks, deep reinforcement learning has been gradually applied to continuous control tasks. Nevertheless, compared with classical control strategies and methods, the instability of deep reinforcement learning limits its extensive application in real scenarios. The instability of reinforcement learning mainly comes from two aspects: the first is the inherent discontinuity of reinforcement learning action strategies; the second is the randomness of reinforcement learning action strategies. This paper will discuss theoretical reasons of this instability and evaluate the instability of the current mainstream reinforcement learning algorithms by time-frequency analysis. Finally, we give an improved framework based on stochastic differential equation, and theoretically solve the inherent discontinuity of reinforcement learning action strategy.

[1]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[2]  Sergey Levine,et al.  Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[3]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[4]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[6]  Yang Liu,et al.  Incremental Reinforcement Learning - a New Continuous Reinforcement Learning Frame Based on Stochastic Differential Equation methods , 2019, ArXiv.

[7]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[8]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.

[9]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[10]  Takashi Komeda,et al.  REINFORCEMENT LEARNING FOR POMDP USING STATE CLASSIFICATION , 2008, MLMTA.

[11]  Stephen Tyree,et al.  Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[12]  Yuichiro Yoshikawa,et al.  Robot gains social intelligence through multimodal deep reinforcement learning , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.