Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
暂无分享,去创建一个
[1] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .
[2] E. Cheney. Introduction to approximation theory , 1966 .
[3] Y. Davydov. Mixing Conditions for Markov Chains , 1974 .
[4] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[5] D. Pollard. Convergence of stochastic processes , 1984 .
[6] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[7] D. Pollard. Empirical Processes: Theory and Applications , 1990 .
[8] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[9] Bin Yu. RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .
[10] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[11] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.
[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[13] A. Nobel. Histogram regression estimation using data-dependent partitions , 1996 .
[14] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[15] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .
[16] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[17] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[18] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[19] Y. Baraud,et al. ADAPTIVE ESTIMATION IN AUTOREGRESSION OR β-MIXING REGRESSION VIA MODEL SELECTION By , 2001 .
[20] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.
[21] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[22] Xiaohong Chen,et al. MIXING AND MOMENT PROPERTIES OF VARIOUS GARCH AND STOCHASTIC VOLATILITY MODELS , 2002, Econometric Theory.
[23] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[24] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[25] Ron Meir,et al. Nonparametric Time Series Prediction Through Adaptive Model Selection , 2000, Machine Learning.
[26] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[27] William D. Smart,et al. Interpolation-based Q-learning , 2004, ICML.
[28] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[29] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[30] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[31] Susan A. Murphy,et al. A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..
[32] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .