论文信息 - On the Computational Complexity of Stochastic Controller Optimization in POMDPs

On the Computational Complexity of Stochastic Controller Optimization in POMDPs

We show that the problem of finding an optimal stochastic blind controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard in PSPACE and sqrt-sum-hard, hence placing it in NP would imply breakthroughs in long-standing open problems in computer science. Our result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard. Nonetheless, we outline a special case that is convex and admits efficient global solutions.

[1] John L. Smith. Markov Decisions on a Partitioned State Space , 1971, IEEE Trans. Syst. Man Cybern..

[2] T. Motzkin,et al. Maxima for Graphs and a New Proof of a Theorem of Turán , 1965, Canadian Journal of Mathematics.

[3] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .

[4] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[5] Eric Allender,et al. Complexity of finite-horizon Markov decision process problems , 2000, JACM.

[6] Michael L. Littman,et al. The Computational Complexity of Probabilistic Planning , 1998, J. Artif. Intell. Res..

[7] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[8] Yasemin Serin,et al. Markov decision processes under observability constraints , 2005, Math. Methods Oper. Res..

[9] Ronald L. Graham,et al. Some NP-complete geometric problems , 1976, STOC '76.

[10] Peter Bro Miltersen,et al. 2 The Task of a Numerical Analyst , 2022 .

[11] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.

[12] Shlomo Zilberstein,et al. Solving POMDPs using quadratically constrained linear programs , 2006, AAMAS '06.

[13] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[14] John F. Canny,et al. Some algebraic and geometric computations in PSPACE , 1988, STOC '88.

[15] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[16] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[17] VlassisNikos,et al. On the Computational Complexity of Stochastic Controller Optimization in POMDPs , 2012 .

[18] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[19] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[20] Judy Goldsmith,et al. Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.

[21] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22] James E. Falk,et al. Jointly Constrained Biconvex Programming , 1983, Math. Oper. Res..

[23] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[24] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[25] Shlomo Zilberstein,et al. Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[26] Kousha Etessami,et al. On the Complexity of Nash Equilibria and Other Fixed Points , 2010, SIAM J. Comput..

[27] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.

[28] Marc Toussaint,et al. Bayesian Time Series Models: Expectation maximisation methods for solving (PO)MDPs and optimal control problems , 2011 .

[29] N. Hastings,et al. Markov programming with policy constraints , 1979 .