Managing Uncertainty within Value Function Approximation in Reinforcement Learning
暂无分享,去创建一个
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[3] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[4] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[5] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[6] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[7] Yutaka Sakaguchi,et al. Reliability of internal prediction/estimation and its application. I. Adaptive action selection reflecting reliability of value function , 2004, Neural Networks.
[8] Jeffrey K. Uhlmann,et al. Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Yaakov Engel,et al. Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .
[11] D. Bertsekas,et al. Q-learning algorithms for optimal stopping based on least squares , 2007, 2007 European Control Conference (ECC).
[12] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[13] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[14] Matthieu Geist,et al. Tracking in Reinforcement Learning , 2009, ICONIP.
[15] Matthieu Geist,et al. Kalman Temporal Differences: The deterministic case , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.