论文信息 - Value-Aware Loss Function for Model Learning in Reinforcement Learning

Value-Aware Loss Function for Model Learning in Reinforcement Learning

We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, might be an overkill because such a probabilistic loss does not take into account the underlying structure of the decision problem and the RL algorithm that intends to solve it. We introduce a loss function that takes the structure of the value function into account. We provide a finite-sample upper bound for the loss function showing the dependence of the error on model approximation error and the number of samples.

Csaba Szepesvari | Amir-massoud Farahmand | D. Nikovski | Gergely Neu | V. Gómez

[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[3] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[4] Alexander J. Smola,et al. Kernel methods and the exponential family , 2006, ESANN.

[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..

[7] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[8] Shie Mannor,et al. Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.

[9] Csaba Szepesvári,et al. Model-based and Model-free Reinforcement Learning for Visual Servoing , 2009, 2009 IEEE International Conference on Robotics and Automation.

[10] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[11] Peter Stone,et al. TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.

[12] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..

[13] Carl E. Rasmussen,et al. Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[15] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..

[16] Bernardo Ávila Pires,et al. Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models , 2016, COLT.

[17] Aapo Hyvärinen,et al. Density Estimation in Infinite Dimensional Exponential Families , 2013, J. Mach. Learn. Res..