论文信息 - Unobserved Is Not Equal to Non-existent: Using Gaussian Processes to Infer Immediate Rewards Across Contexts - 字舞流文

Unobserved Is Not Equal to Non-existent: Using Gaussian Processes to Infer Immediate Rewards Across Contexts

Learning optimal policies in real-world domains with delayed rewards is a major challenge in Reinforcement Learning. We address the credit assignment problem by proposing a Gaussian Process (GP)-based immediate reward approximation algorithm and evaluate its effectiveness in 4 contexts where rewards can be delayed for long trajectories. In one GridWorld game and 8 Atari games, where immediate rewards are available, our results showed that on 7 out 9 games, the proposed GPinferred reward policy performed at least as well as the immediate reward policy and significantly outperformed the corresponding delayed reward policy. In e-learning and healthcare applications, we combined GP-inferred immediate rewards with offline Deep Q-Network (DQN) policy induction and showed that the GP-inferred reward policies outperformed the policies induced using delayed rewards in both real-world contexts.

Tiffany Barnes | Min Chi | Hamoon Azizsoltani | Yeo-Jin Kim | Markel Sanz Ausin | T. Barnes | Min Chi | Yeonji Kim | Hamoon Azizsoltani

[1] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[2] Richard K. Staley,et al. From Example Study to Problem Solving: Smooth Transitions Help Learning , 2002 .

[3] Yang Gao,et al. Potential Based Reward Shaping for Hierarchical Reinforcement Learning , 2015, IJCAI.

[4] Min Chi,et al. Temporal Belief Memory: Imputing Missing Data during RNN Training , 2018, IJCAI.

[5] Towards effective algorithms for linear groups , 2006 .

[6] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[7] T. Brewin,et al. Journal of Consulting and Clinical Psychology , 2002 .

[8] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[9] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[10] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[11] Scott Sanner,et al. Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach , 2018, NeurIPS.

[12] L. Cronbach,et al. Aptitudes and instructional methods: A handbook for research on interactions , 1977 .

[13] C Alessandra Colaianni. Terra Nova. , 2018, The New England journal of medicine.

[14] Vincent Aleven,et al. The worked-example effect: Not an artefact of lousy control conditions , 2009, Comput. Hum. Behav..

[15] D. Nychka,et al. A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets , 2015 .

[16] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[17] Garrison W. Cottrell,et al. Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[18] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[19] D. Signorini,et al. Neural networks , 1995, The Lancet.

[20] R. Snow,et al. Aptitude-treatment interaction as a framework for research on individual differences in psychotherapy. , 1991, Journal of consulting and clinical psychology.

[21] R. Maitra,et al. Supplement to “ A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere ” published in the Journal of Computational and Graphical Statistics , 2009 .

[22] M. V. Rossum,et al. In Neural Computation , 2022 .

[23] Peter Szolovits,et al. Deep Reinforcement Learning for Sepsis Treatment , 2017, ArXiv.

[24] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.

[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[27] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[28] B. K. Panigrahi,et al. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE , 2010 .