Efficient Exploration With Latent Structure
暂无分享,去创建一个
[1] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[4] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[5] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[6] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[7] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.
[8] Philip W. L. Fong. A Quantitative Study of Hypothesis Selection , 1995, ICML.
[9] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[10] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[11] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[12] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[13] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[14] Peter Stone,et al. Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.
[15] Michael L. Littman,et al. An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.
[16] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .
[17] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[18] Shie Mannor,et al. Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.
[19] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[20] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .
[21] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[22] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[23] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[24] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .
[25] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.