Deep Rational Reinforcement Learning

Latest insights from biology show that intelligence not only emerges from the connections between neurons but that individual neurons shoulder more computational responsibility than previously anticipated. This perspective should be critical in the context of constantly changing distinct reinforcement learning environments, yet current approaches still primarily employ static activation functions. In this work, we motivate why rationals are suitable for adaptable activation functions and why their inclusion into neural networks is crucial. Inspired by recurrence in residual networks, we derive a condition under which rational units are closed under residual connections and formulate a naturally regularised version: the recurrentrational. We demonstrate that equipping popular algorithms with (recurrent-)rational activations leads to consistent improvements on Atari games, especially turning simple DQN into a solid approach, competitive to DDQN and Rainbow.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Pablo Samuel Castro,et al.  Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research , 2021, ICML.

[3]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[4]  Kristian Kersting,et al.  Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks , 2019, ICLR.

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Daniel Guo,et al.  Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.

[7]  Y. Nakatsukasa,et al.  Rational neural networks , 2020, NeurIPS.

[8]  Serge J. Belongie,et al.  Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[9]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[10]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[11]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[12]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[13]  Zhiyong Chen,et al.  Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience , 2021, IEEE transactions on neural networks and learning systems.

[14]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[15]  Matus Telgarsky,et al.  Neural Networks and Rational Functions , 2017, ICML.

[16]  Derek Nowrouzezahrai,et al.  Promoting Coordination through Policy Regularization in Multi-Agent Reinforcement Learning , 2019, NeurIPS.