Robust planning in domains with stochastic outcomes, adversaries, and partial observability
暂无分享,去创建一个
[1] S. Karlin,et al. SOLUTIONS OF CONVEX GAMES AS FIXED-POINTS, , 1951 .
[2] N. Dalkey. EQUIVALENCE OF INFORMATION PATTERNS AND ESSENTIALLY DETERMINATE GAMES , 1952 .
[3] H. W. Kuhn,et al. 11. Extensive Games and the Problem of Information , 1953 .
[4] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[5] J. F. Benders. Partitioning procedures for solving mixed-variables programming problems , 1962 .
[6] R. Selten. Reexamination of the perfectness concept for equilibrium points in extensive games , 1975, Classics in Game Theory.
[7] Hanif D. Sherali,et al. Linear Programming and Network Flows , 1977 .
[8] I. Duff,et al. Direct Methods for Sparse Matrices , 1987 .
[9] Baruch Schieber,et al. The Canadian Traveller Problem , 1991, SODA '91.
[10] Mihalis Yannakakis,et al. Shortest Paths Without a Map , 1989, Theor. Comput. Sci..
[11] Philip J. Reny,et al. Rationality in Extensive-Form Games , 1992 .
[12] D. Koller,et al. The complexity of two-person zero-sum games in extensive form , 1992 .
[13] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[14] J. Hiriart-Urruty,et al. Convex analysis and minimization algorithms , 1993 .
[15] Reinhard Selten,et al. Multistage Game Models and Delay Supergames , 1994 .
[16] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[17] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[18] Bernhard von Stengel,et al. Fast algorithms for finding randomized strategies in game trees , 1994, STOC '94.
[19] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[20] Leslie Pack Kaelbling,et al. Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..
[21] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[22] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[23] Avrim Blum,et al. Fast Planning Through Planning Graph Analysis , 1995, IJCAI.
[24] Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.
[25] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[26] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[27] B. Stengel,et al. COMPUTING EQUILIBRIA FOR TWO-PERSON GAMES , 1996 .
[28] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[29] Yoav Freund,et al. Game theory, on-line prediction and boosting , 1996, COLT '96.
[30] Ariel Rubinstein,et al. On the Interpretation of Decision Problems with Imperfect Recall , 1996, TARK.
[31] David Andre,et al. Generalized Prioritized Sweeping , 1997, NIPS.
[32] Amedeo Cesta,et al. Recent Advances in AI Planning , 1997, Lecture Notes in Computer Science.
[33] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[34] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[35] R. A. Park. Shortest Paths in a Dynamic Uncertain Domain , 1999 .
[36] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .
[37] Manuela Veloso,et al. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .
[38] Russ Bubley,et al. Randomized algorithms , 1995, CSUR.
[39] Michael L. Littman,et al. Abstraction Methods for Game Theoretic Poker , 2000, Computers and Games.
[40] Shlomo Zilberstein,et al. LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..
[41] Sebastian Thrun,et al. A system for multi-agent coordination in uncertain environments , 2001, AGENTS '01.
[42] Sven Koenig,et al. Incremental A* , 2001, NIPS.
[43] M. Rosencrantz,et al. Locating Moving Entities in Dynamic Indoor Environments with Teams of Mobile Robots , 2002 .
[44] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[45] Anshul Gupta,et al. Recent advances in direct methods for solving unsymmetric sparse systems of linear equations , 2002, TOMS.
[46] Koby Crammer,et al. Advances in Neural Information Processing Systems 14 , 2002 .
[47] William H. Press,et al. Numerical recipes in C , 2002 .
[48] Adam Tauman Kalai,et al. Geometric algorithms for online optimization , 2002 .
[49] Manfred K. Warmuth,et al. Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..
[50] Edith Cohen,et al. Making intra-domain routing robust to changing and uncertain traffic demands: understanding fundamental tradeoffs , 2003, SIGCOMM '03.
[51] Edith Cohen,et al. Optimal oblivious routing in polynomial time , 2003, STOC '03.
[52] Blai Bonet,et al. Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback , 2003, IJCAI.
[53] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.
[54] H. Brendan McMahan,et al. Planning in Cost-Paired Markov Decision Process Games , 2003 .
[55] Adam Meyerson,et al. Online oblivious routing , 2003, SPAA '03.
[56] Marcin Bienkowski,et al. A practical algorithm for constructing oblivious routing schemes , 2003, SPAA '03.
[57] Jonathan Schaeffer,et al. Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.
[58] Avrim Blum,et al. Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.
[59] Sebastian Thrun,et al. Locating moving entities in indoor environments with teams of mobile robots , 2003, AAMAS '03.
[60] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[61] Sylvain Sorin,et al. Stochastic Games and Applications , 2003 .
[62] Thomas G. Dietterich,et al. Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine Learning.
[63] Anthony Stentz,et al. Focused Dynamic Programming: Extensive Comparative Results , 2004 .
[64] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[65] R. Ravi,et al. Boosted sampling: approximation algorithms for stochastic optimization , 2004, STOC '04.
[66] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.
[67] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[68] Giacomo Bonanno,et al. Memory and perfect recall in extensive games , 2004, Games Econ. Behav..
[69] R. Ravi,et al. Hedging Uncertainty: Approximation Algorithms for Stochastic Optimization Problems , 2004, Math. Program..
[70] Sebastian Thrun,et al. Planning for Markov Decision Processes with Sparse Stochasticity , 2004, NIPS.
[71] Jeff G. Schneider,et al. Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..
[72] Shlomo Zilberstein,et al. Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.
[73] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.
[74] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[75] Nicole Immorlica,et al. On the costs and benefits of procrastination: approximation algorithms for stochastic combinatorial optimization problems , 2004, SODA '04.
[76] Sebastian Thrun,et al. PAO for planning with hidden state , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[77] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[78] No-Regret Algorithms for Structured Prediction Problems , 2005 .
[79] Geoffrey J. Gordon,et al. Generalizing Dijkstra's Algorithm and Gaussian Elimination for Solving MDPs , 2005 .
[80] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[81] Jacques F. Benders,et al. Partitioning procedures for solving mixed-variables programming problems , 2005, Comput. Manag. Sci..
[82] Tuomas Sandholm,et al. Optimal Rhode Island Hold'em Poker , 2005, AAAI.
[83] Geoffrey J. Gordon,et al. Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..
[84] Geoffrey J. Gordon,et al. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.
[85] Mohit Singh,et al. How to pay, come what may: approximation algorithms for demand-robust covering problems , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).
[86] Geoffrey J. Gordon,et al. Fast Exact Planning in Markov Decision Processes , 2005, ICAPS.
[87] J. M. Bilbao,et al. Contributions to the Theory of Games , 2005 .
[88] Tuomas Sandholm,et al. A Texas Hold'em poker player based on automated abstraction and real-time equilibrium computation , 2006, AAMAS '06.
[89] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[90] Andrew J. Schaefer,et al. SPAR: stochastic programming with adversarial recourse , 2006, Oper. Res. Lett..
[91] David S. Leslie,et al. Generalised weakened fictitious play , 2006, Games Econ. Behav..
[92] Peter Bro Miltersen,et al. Computing sequential equilibria for two-player games , 2006, SODA '06.
[93] Tuomas Sandholm,et al. Finding equilibria in large sequential games of imperfect information , 2006, EC '06.
[94] Thomas P. Hayes,et al. Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.