Inferring bounds on the performance of a control policy from a sample of trajectories
暂无分享,去创建一个
[1] D. Ernst. Selecting concise sets of samples for a reinforcement learning agent , 2005 .
[2] Van Nostrand,et al. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .
[3] S. Murphy,et al. An experimental design for the development of adaptive treatment strategies , 2005, Statistics in medicine.
[4] S. Murphy,et al. Optimal dynamic treatment regimes , 2003 .
[5] Eduardo F. Camacho,et al. Model Predictive Controllers , 2007 .
[6] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[7] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[8] Manfred K. Warmuth,et al. On the worst-case analysis of temporal-difference learning algorithms , 2004, Machine Learning.
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[11] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[13] J. Ingersoll. Theory of Financial Decision Making , 1987 .