论文信息 - Inferring bounds on the performance of a control policy from a sample of trajectories

Inferring bounds on the performance of a control policy from a sample of trajectories

We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this paper, the dynamics, control policy, and reward function are supposed to be deterministic and Lipschitz continuous. Under these assumptions, a polynomial algorithm, in terms of the sample size and length of the optimization horizon, is derived to compute these bounds, and their tightness is characterized in terms of the sample density.

Louis Wehenkel | Susan A. Murphy | Damien Ernst | Raphaël Fonteneau

[1] D. Ernst. Selecting concise sets of samples for a reinforcement learning agent , 2005 .

[2] Van Nostrand,et al. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[3] S. Murphy,et al. An experimental design for the development of adaptive treatment strategies , 2005, Statistics in medicine.

[4] S. Murphy,et al. Optimal dynamic treatment regimes , 2003 .

[5] Eduardo F. Camacho,et al. Model Predictive Controllers , 2007 .

[6] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[7] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[8] Manfred K. Warmuth,et al. On the worst-case analysis of temporal-difference learning algorithms , 2004, Machine Learning.

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[11] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[13] J. Ingersoll. Theory of Financial Decision Making , 1987 .