Processos de Decisão de Markov: um tutorial

Ha situacoes em que decisoes devem ser tomadas em sequencia, e o resultado de cada decisao nao e claro para o tomador de decisoes. Estas situacoes podem ser formuladas matematicamente como processos de decisao de Markov, e dadas as probabilidades dos valores resultantes das decisoes, e possivel determinar uma politica que maximize o valor esperado da sequencia de decisoes. Este tutorial descreve os processos de decisao de Markov (tanto o caso completamente observavel como o parcialmente observavel) e discute brevemente alguns metodos para a sua solucao. Processos semi-Markovianos nao sao discutidos.

[1]  Fabio Gagliardi Cozman,et al.  Unifying Nondeterministic and Probabilistic Planning Through Imprecise Markov Decision Processes , 2006, IBERAMIA-SBIA.

[2]  Larry D. Pyeatt,et al.  A Parallel Algorithm for POMDP Solution , 1999, ECP.

[3]  Shlomo Zilberstein,et al.  Region-Based Incremental Pruning for POMDPs , 2004, UAI.

[4]  Eric A. Hansen,et al.  An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[5]  Stuart E. Dreyfus,et al.  Applied Dynamic Programming , 1965 .

[6]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[7]  Douglas Aberdeen,et al.  Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[8]  Kevin D. Seppi,et al.  Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..

[9]  James S. Kakalik,et al.  OPTIMUM POLICIES FOR PARTIALLY OBSERVABLE MARKOV SYSTEMS , 1965 .

[10]  Douglas Aberdeen,et al.  Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[11]  Chelsea C. White,et al.  Finite-Memory Suboptimal Design for Partially Observed Markov Decision Processes , 1994, Oper. Res..

[12]  A. Cassandra A Survey of POMDP Applications , 2003 .

[13]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[14]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[15]  Craig Boutilier,et al.  Who's asking for help?: a Bayesian approach to intelligent assistance , 2006, IUI '06.

[16]  Eric A. Hansen,et al.  Synthesis of Hierarchical Finite-State Controllers for POMDPs , 2003, ICAPS.

[17]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[18]  Trey Smith,et al.  Probabilistic planning for robotic exploration , 2007 .

[19]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[20]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[21]  Weihong Zhang,et al.  Restricted Value Iteration: Theory and Algorithms , 2011, J. Artif. Intell. Res..

[22]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[23]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[24]  Reid G. Simmons,et al.  Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic , 2006, AAAI.

[25]  Pascal Poupart,et al.  Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[26]  R. Bellman A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .

[27]  Joelle Pineau,et al.  Tractable planning under uncertainty: exploiting structure , 2004 .

[28]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[29]  D. J. White,et al.  A Survey of Applications of Markov Decision Processes , 1993 .

[30]  Oliver Alfred Gross,et al.  On the Optimal Inventory Equation , 1955 .

[31]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[32]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[33]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[34]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[35]  Leslie Pack Kaelbling,et al.  Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.

[36]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[37]  D. Bertsekas,et al.  Approximate solution methods for partially observable markov and semi-markov decision processes , 2006 .

[38]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[39]  Shlomo Zilberstein,et al.  Solving POMDPs using quadratically constrained linear programs , 2006, AAMAS '06.

[40]  Eric A. Hansen,et al.  An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.

[41]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[42]  Kevin P. Murphy,et al.  A Survey of POMDP Solution Techniques , 2000 .

[43]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[44]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[45]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[46]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[47]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[48]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[49]  Geoffrey J. Gordon,et al.  Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[50]  Hideaki Itoh,et al.  Partially observable Markov decision processes with imprecise parameters , 2007, Artif. Intell..

[51]  Milos Hauskrecht,et al.  Planning and control in stochastic domains with imperfect information , 1997 .

[52]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[53]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[54]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[55]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[56]  Chelsea C. White,et al.  Procedures for the Solution of a Finite-Horizon, Partially Observed, Semi-Markov Optimization Problem , 1976, Oper. Res..

[57]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[58]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[59]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[60]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[61]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[62]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.