论文信息 - Processos de Decisão de Markov: um tutorial - 字舞流文

Processos de Decisão de Markov: um tutorial

Ha situacoes em que decisoes devem ser tomadas em sequencia, e o resultado de cada decisao nao e claro para o tomador de decisoes. Estas situacoes podem ser formuladas matematicamente como processos de decisao de Markov, e dadas as probabilidades dos valores resultantes das decisoes, e possivel determinar uma politica que maximize o valor esperado da sequencia de decisoes. Este tutorial descreve os processos de decisao de Markov (tanto o caso completamente observavel como o parcialmente observavel) e discute brevemente alguns metodos para a sua solucao. Processos semi-Markovianos nao sao discutidos.

Jacques Wainer | Jerônimo Pellegrini | Jerônimo Pellegrini | J. Wainer

[1] Fabio Gagliardi Cozman,et al. Unifying Nondeterministic and Probabilistic Planning Through Imprecise Markov Decision Processes , 2006, IBERAMIA-SBIA.

[2] Larry D. Pyeatt,et al. A Parallel Algorithm for POMDP Solution , 1999, ECP.

[3] Shlomo Zilberstein,et al. Region-Based Incremental Pruning for POMDPs , 2004, UAI.

[4] Eric A. Hansen,et al. An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[5] Stuart E. Dreyfus,et al. Applied Dynamic Programming , 1965 .

[6] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[7] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[8] Kevin D. Seppi,et al. Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..

[9] James S. Kakalik,et al. OPTIMUM POLICIES FOR PARTIALLY OBSERVABLE MARKOV SYSTEMS , 1965 .

[10] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[11] Chelsea C. White,et al. Finite-Memory Suboptimal Design for Partially Observed Markov Decision Processes , 1994, Oper. Res..

[12] A. Cassandra. A Survey of POMDP Applications , 2003 .

[13] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[14] P. Poupart. Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[15] Craig Boutilier,et al. Who's asking for help?: a Bayesian approach to intelligent assistance , 2006, IUI '06.

[16] Eric A. Hansen,et al. Synthesis of Hierarchical Finite-State Controllers for POMDPs , 2003, ICAPS.

[17] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[18] Trey Smith,et al. Probabilistic planning for robotic exploration , 2007 .

[19] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[20] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.

[21] Weihong Zhang,et al. Restricted Value Iteration: Theory and Algorithms , 2011, J. Artif. Intell. Res..

[22] Shlomo Zilberstein,et al. Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[23] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[24] Reid G. Simmons,et al. Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic , 2006, AAAI.

[25] Pascal Poupart,et al. Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[26] R. Bellman. A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .

[27] Joelle Pineau,et al. Tractable planning under uncertainty: exploiting structure , 2004 .

[28] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[29] D. J. White,et al. A Survey of Applications of Markov Decision Processes , 1993 .

[30] Oliver Alfred Gross,et al. On the Optimal Inventory Equation , 1955 .

[31] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[32] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[33] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .

[34] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[35] Leslie Pack Kaelbling,et al. Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.

[36] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[37] D. Bertsekas,et al. Approximate solution methods for partially observable markov and semi-markov decision processes , 2006 .

[38] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[39] Shlomo Zilberstein,et al. Solving POMDPs using quadratically constrained linear programs , 2006, AAMAS '06.

[40] Eric A. Hansen,et al. An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.

[41] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[42] Kevin P. Murphy,et al. A Survey of POMDP Solution Techniques , 2000 .

[43] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[44] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[45] Claudia V. Goldman,et al. Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[46] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.

[47] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[48] Geoffrey J. Gordon,et al. Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[49] Geoffrey J. Gordon,et al. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[50] Hideaki Itoh,et al. Partially observable Markov decision processes with imprecise parameters , 2007, Artif. Intell..

[51] Milos Hauskrecht,et al. Planning and control in stochastic domains with imperfect information , 1997 .

[52] George E. Monahan,et al. A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[53] Chelsea C. White,et al. Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[54] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[55] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[56] Chelsea C. White,et al. Procedures for the Solution of a Finite-Horizon, Partially Observed, Semi-Markov Optimization Problem , 1976, Oper. Res..

[57] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[58] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[59] Makoto Yokoo,et al. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[60] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[61] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[62] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.