暂无分享,去创建一个
[1] T. Söderström,et al. Instrumental variable methods for system identification , 1983 .
[2] Graham C. Goodwin,et al. Adaptive filtering prediction and control , 1984 .
[3] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[7] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[8] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[9] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[10] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[11] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[12] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[13] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[14] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[15] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[16] Jeffrey K. Uhlmann,et al. Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.
[17] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[18] Rudolph van der Merwe,et al. Sigma-point kalman filters for probabilistic inference in dynamic state-space models , 2004 .
[19] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[20] Sang Woo Kim,et al. Consistent normalized least mean square filtering with noisy data matrix , 2005, IEEE Transactions on Signal Processing.
[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[22] Yaakov Engel,et al. Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .
[23] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[24] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[25] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[26] D. Simon. Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches , 2006 .
[27] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[28] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[29] Robert Fitch,et al. Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation , 2007, ICML '07.
[30] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[31] Thomas Martinetz,et al. Improving Optimality of Neural Rewards Regression for Data-Efficient Batch Near-Optimal Policy Identification , 2007, ICANN.
[32] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[33] Richard S. Sutton,et al. On the role of tracking in stationary environments , 2007, ICML '07.
[34] Gene H. Golub,et al. Methods for modifying matrix factorizations , 1972, Milestones in Matrix Computation.
[35] D. Bertsekas,et al. Q-learning algorithms for optimal stopping based on least squares , 2007, 2007 European Control Conference (ECC).
[36] Matthieu Geist,et al. Bayesian Reward Filtering , 2008, EWRL.
[37] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[38] Matthieu Geist,et al. Tracking in Reinforcement Learning , 2009, ICONIP.
[39] Matthieu Geist,et al. Kalman Temporal Differences: The deterministic case , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[40] Matthieu Geist,et al. Eligibility traces through colored noises , 2010, International Congress on Ultra Modern Telecommunications and Control Systems.
[41] O. Pietquin,et al. Managing Uncertainty within Value Function Approximation in Reinforcement Learning , 2010 .
[42] Matthieu Geist,et al. Statistically linearized least-squares temporal differences , 2010, International Congress on Ultra Modern Telecommunications and Control Systems.
[43] Matthieu Geist,et al. Revisiting Natural Actor-Critics with Value Function Approximation , 2010, MDAI.
[44] O. Pietquin,et al. Statistically linearized recursive least squares , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.
[45] Olivier Buffet,et al. Markov Decision Processes in Artificial Intelligence , 2010 .