On the Computational Complexity of Stochastic Controller Optimization in POMDPs

We show that the problem of finding an optimal stochastic blind controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard in PSPACE and sqrt-sum-hard, hence placing it in NP would imply breakthroughs in long-standing open problems in computer science. Our result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard. Nonetheless, we outline a special case that is convex and admits efficient global solutions.

[1]  John L. Smith Markov Decisions on a Partitioned State Space , 1971, IEEE Trans. Syst. Man Cybern..

[2]  T. Motzkin,et al.  Maxima for Graphs and a New Proof of a Theorem of Turán , 1965, Canadian Journal of Mathematics.

[3]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[4]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[5]  Eric Allender,et al.  Complexity of finite-horizon Markov decision process problems , 2000, JACM.

[6]  Michael L. Littman,et al.  The Computational Complexity of Probabilistic Planning , 1998, J. Artif. Intell. Res..

[7]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[8]  Yasemin Serin,et al.  Markov decision processes under observability constraints , 2005, Math. Methods Oper. Res..

[9]  Ronald L. Graham,et al.  Some NP-complete geometric problems , 1976, STOC '76.

[10]  Peter Bro Miltersen,et al.  2 The Task of a Numerical Analyst , 2022 .

[11]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[12]  Shlomo Zilberstein,et al.  Solving POMDPs using quadratically constrained linear programs , 2006, AAMAS '06.

[13]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[14]  John F. Canny,et al.  Some algebraic and geometric computations in PSPACE , 1988, STOC '88.

[15]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[16]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[17]  VlassisNikos,et al.  On the Computational Complexity of Stochastic Controller Optimization in POMDPs , 2012 .

[18]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[19]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[20]  Judy Goldsmith,et al.  Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.

[21]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22]  James E. Falk,et al.  Jointly Constrained Biconvex Programming , 1983, Math. Oper. Res..

[23]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[24]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[25]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[26]  Kousha Etessami,et al.  On the Complexity of Nash Equilibria and Other Fixed Points , 2010, SIAM J. Comput..

[27]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[28]  Marc Toussaint,et al.  Bayesian Time Series Models: Expectation maximisation methods for solving (PO)MDPs and optimal control problems , 2011 .

[29]  N. Hastings,et al.  Markov programming with policy constraints , 1979 .