The factored policy-gradient planner
暂无分享,去创建一个
[1] Bernhard Nebel,et al. The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..
[2] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[3] Lin Zhang,et al. Decision-Theoretic Military Operations Planning , 2004, ICAPS.
[4] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.
[5] Nicol N. Schraudolph,et al. Conjugate Directions for Stochastic Gradient Descent , 2002, ICANN.
[6] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[7] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[8] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[9] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[10] Mausam,et al. Challenges for Temporal Planning with Uncertain Durations , 2006, ICAPS.
[11] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[12] Olivier Buffet,et al. FF + FPG: Guiding a Policy-Gradient Planner , 2007, ICAPS.
[13] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[14] Shlomo Zilberstein,et al. LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..
[15] Kurt Driessens,et al. Relational Reinforcement Learning , 1998, Machine-mediated learning.
[16] Douglas Aberdeen,et al. Policy-Gradient Methods for Planning , 2005, NIPS.
[17] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[18] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.
[19] Robert Givan,et al. FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.
[20] Mausam,et al. Planning with Durative Actions in Stochastic Domains , 2008, J. Artif. Intell. Res..
[21] Charles Gretton. Gradient-Based Relational Reinforcement Learning of Temporally Extended Policies , 2007, ICAPS.
[22] Sylvie Thiébaux,et al. Probabilistic planning vs replanning , 2007 .
[23] Avrim Blum,et al. Fast Planning Through Planning Graph Analysis , 1995, IJCAI.
[24] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[25] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[26] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.
[27] Mausam,et al. Concurrent Probabilistic Temporal Planning , 2005, ICAPS.
[28] Lex Weaver,et al. A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.
[29] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[30] Craig Boutilier,et al. Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.
[31] Mausam,et al. Probabilistic Temporal Planning with Uncertain Durations , 2006, AAAI.
[32] Alan Fern,et al. Discriminative Learning of Beam-Search Heuristics for Planning , 2007, IJCAI.
[33] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.
[34] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[35] Håkan L. S. Younes. Extending PDDL to Model Stochastic Decision Processes , 2003 .
[36] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[37] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[38] Subbarao Kambhampati,et al. When is Temporal Planning Really Temporal? , 2007, IJCAI.
[39] Jesse Hoey,et al. SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.
[40] Scott Sanner,et al. Practical Linear Value-approximation Techniques for First-order MDPs , 2006, UAI.
[41] Pau-Lo Hsu,et al. A cooperative policy for conflict resolution to multi-agent exploration , 2010 .
[42] David E. Smith,et al. Conditional Effects in Graphplan , 1998, AIPS.
[43] Olivier Buffet,et al. Concurrent Probabilistic Temporal Planning with Policy-Gradients , 2007, ICAPS.
[44] Håkan L. S. Younes,et al. PPDDL 1 . 0 : An Extension to PDDL for Expressing Planning Domains with Probabilistic Effects , 2004 .
[45] Jonathan Baxter,et al. Scaling Internal-State Policy-Gradient Methods for POMDPs , 2002 .
[46] Ari K. Jónsson,et al. MAPGEN: Mixed-Initiative Planning and Scheduling for the Mars Exploration Rover Mission , 2004, IEEE Intell. Syst..
[47] Maria Fox,et al. PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..
[48] Håkan L. S. Younes,et al. Policy Generation for Continuous-time Stochastic Domains with Concurrency , 2004, ICAPS.
[49] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[50] Sylvie Thiébaux,et al. Concurrent Probabilistic Planning in the Graphplan Framework , 2006, ICAPS.
[51] Sylvie Thiébaux,et al. Prottle: A Probabilistic Temporal Planner , 2005, AAAI.
[52] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..