Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs

Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones.

[1]  François Charpillet,et al.  An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs , 2005, ECML.

[2]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[3]  Shlomo Zilberstein,et al.  Achieving goals in decentralized POMDPs , 2009, AAMAS.

[4]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[5]  E. Clay Anderson,et al.  Laminar or turbulent boundary-layer flows of perfect gases or reacting gas mixtures in chemical equilibrium , 1971 .

[6]  F. Clauser The Turbulent Boundary Layer , 1956 .

[7]  J. E. Harris Numerical solution of the equations for compressible laminar, transitional, and turbulent boundary layers and comparisons with experimental data , 1971 .

[8]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[9]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[10]  Taoting Hsu,et al.  Viscous hypersonic flow: by William H. Dorrance. 334 pages, diagrams, 6 × 9 in. New York, McGraw-Hill Book Co., Inc., 1962. Price, $12.50 , 1963 .

[11]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[12]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[13]  D. Spalding,et al.  Heat and Mass Transfer in Boundary Layers. 2nd edition. By S. V. PATANKAR and D. B. SPALDING. Intertext Books, 1970. 255 pp. £6. , 1971, Journal of Fluid Mechanics.

[14]  Shlomo Zilberstein,et al.  Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs , 2010, Autonomous Agents and Multi-Agent Systems.

[15]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[16]  Anne Condon,et al.  On the undecidability of probabilistic planning and related stochastic optimization problems , 2003, Artif. Intell..

[17]  Blai Bonet,et al.  Automatic Derivation of Memoryless Policies and Finite-State Controllers Using Classical Planners , 2009, ICAPS.

[18]  Hui Li,et al.  Point-Based Policy Iteration , 2007, AAAI.

[19]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[20]  Shlomo Zilberstein,et al.  Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..

[21]  John C. Adams,et al.  Implicit Finite-Difference Analysis of Compressible Laminar, Transitional, and Turbulent Boundary Layers along the Windward Streamline of a Sharp Cone at Incidence , 1971 .

[22]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[23]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[24]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[25]  Roddam Narasimha,et al.  Some properties of boundary layer flow during the transition from laminar to turbulent motion , 1958, Journal of Fluid Mechanics.

[26]  F. Blottner,et al.  FINITE DIFFERENCE METHODS OF SOLUTION OF THE BOUNDARY-LAYER EQUATIONS , 1970 .

[27]  Shlomo Zilberstein,et al.  Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.

[28]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[29]  R. Davis,et al.  Numerical solution of the hypersonic viscous shock-layer equations , 1970 .

[30]  Blai Bonet,et al.  Solving POMDPs: RTDP-Bel vs. Point-based Algorithms , 2009, IJCAI.

[31]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.