Risk-aware curriculum generation for heavy-tailed task distributions

Automated curriculum generation for reinforcement learning (RL) aims to speed up learning by designing a sequence of tasks of increasing difficulty. Such tasks are usually drawn from probability distributions with exponentially bounded tails, such as uniform or Gaussian distributions. However, existing approaches overlook heavy-tailed distributions. Under such distributions, current methods may fail to learn optimal policies in rare and risky tasks, which fall under the tails and yield the lowest returns, respectively. We address this challenge by proposing a risk-aware curriculum generation algorithm that simultaneously creates two curricula: 1) a primary curriculum that aims to maximize the expected discounted return with respect to a distribution over target tasks, and 2) an auxiliary curriculum that identifies and over-samples rare and risky tasks observed in the primary curriculum. Our empirical results evidence that the proposed algorithm achieves significantly higher returns in frequent as well as rare tasks compared to the state-of-the-art methods.

[1]  U. Topcu,et al.  Reward-Machine-Guided, Self-Paced Reinforcement Learning , 2023, AAMAS.

[2]  M. Spaan,et al.  Safety-constrained reinforcement learning with a distributional safety critic , 2022, Machine Learning.

[3]  Shie Mannor,et al.  Efficient Risk-Averse Reinforcement Learning , 2022, NeurIPS.

[4]  Andrew Kyle Lampinen,et al.  Zipfian environments for Reinforcement Learning , 2022, CoLLAs.

[5]  Edward Grefenstette,et al.  A Survey of Zero-shot Generalisation in Deep Reinforcement Learning , 2021, J. Artif. Intell. Res..

[6]  Yu Wang,et al.  Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems , 2021, NeurIPS.

[7]  Edward Grefenstette,et al.  Replay-Guided Adversarial Environment Design , 2021, NeurIPS.

[8]  Marius Lindauer,et al.  Self-Paced Context Evaluation for Contextual Reinforcement Learning , 2021, ICML.

[9]  Jan Peters,et al.  A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning , 2021, J. Mach. Learn. Res..

[10]  Yanan Sui,et al.  No-Regret Reinforcement Learning with Heavy-Tailed Rewards , 2021, AISTATS.

[11]  Edward Grefenstette,et al.  Prioritized Level Replay , 2020, ICML.

[12]  Andrey Kolobov,et al.  Safe Reinforcement Learning via Curriculum Induction , 2020, NeurIPS.

[13]  Pieter Abbeel,et al.  Automatic Curriculum Learning through Value Disagreement , 2020, NeurIPS.

[14]  Jan Peters,et al.  Self-Paced Deep Reinforcement Learning , 2020, NeurIPS.

[15]  Matthew E. Taylor,et al.  Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey , 2020, J. Mach. Learn. Res..

[16]  Pierre-Yves Oudeyer,et al.  Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments , 2019, CoRL.

[17]  Jan Peters,et al.  Self-Paced Contextual Reinforcement Learning , 2019, CoRL.

[18]  Andrew Kyle Lampinen,et al.  Automated curricula through setter-solver interactions , 2019, ArXiv.

[19]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  P. K. Langat,et al.  Identification of the Most Suitable Probability Distribution Models for Maximum, Minimum, and Mean Streamflow , 2019, Water.

[21]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[22]  Daoyi Dong,et al.  Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[24]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[25]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[26]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[27]  J. Schulman,et al.  OpenAI Gym , 2016, ArXiv.

[28]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[29]  Shie Mannor,et al.  Policy Gradient for Coherent Risk Measures , 2015, NIPS.

[30]  Shie Mannor,et al.  Contextual Markov Decision Processes , 2015, ArXiv.

[31]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[32]  Shie Mannor,et al.  Optimizing the CVaR via Sampling , 2014, AAAI.

[33]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[34]  Dirk P. Kroese,et al.  The Generalized Cross Entropy Method, with Applications to Probability Density Estimation , 2011 .

[35]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[36]  Pierre-Yves Oudeyer,et al.  Intrinsically motivated goal exploration for active motor learning in robots: A case study , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Doina Precup,et al.  Reinforcement learning in the presence of rare events , 2008, ICML '08.

[38]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[39]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[40]  J. Pajarinen,et al.  Curriculum Reinforcement Learning via Constrained Optimal Transport , 2022, ICML.

[41]  A. Gleave,et al.  Stable-Baselines3: Reliable Reinforcement Learning Implementations , 2021, J. Mach. Learn. Res..

[42]  Joelle Pineau,et al.  Learning Robust State Abstractions for Hidden-Parameter Block MDPs , 2021, ICLR.

[43]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[44]  B. Osu,et al.  Financial Risk Assessment with Cauchy Distribution under a Simple Transformation of dividing with a Constant , 2011 .

[45]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.