Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs
暂无分享,去创建一个
Dale Schuurmans | Carlos Guestrin | Relu Patrascu | Dale Schuurmans | Carlos Guestrin | Relu Patrascu
[1] John B. Kidd,et al. Decisions with Multiple Objectives—Preferences and Value Tradeoffs , 1977 .
[2] R. L. Keeney,et al. Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.
[3] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[4] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[5] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[6] F. B. Vernadat,et al. Decisions with Multiple Objectives: Preferences and Value Tradeoffs , 1994 .
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[9] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[10] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[11] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[12] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.
[13] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[14] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[15] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[16] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[17] Judy Goldsmith,et al. Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.
[18] Eric Allender,et al. Complexity of finite-horizon Markov decision process problems , 2000, JACM.
[19] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[20] Dale Schuurmans,et al. Direct value-approximation for factored MDPs , 2001, NIPS.
[21] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.
[22] Timothy X. Brown,et al. Switch Packet Arbitration via Queue-Learning , 2001, NIPS.
[23] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[24] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[25] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[26] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.