论文信息 - The Complexity of Optimal Small Policies

The Complexity of Optimal Small Policies

We investigate the complexity of problems concerned with partially-observable Markov decision processes which are run for a finite number of steps under small policies. The calculation of the expected sum of rewards of a process under a small policy is shown to be complete for the complexity class PP, a class which lies intermediate between NP and PSPACE. Optimal small policy computation is shown to be complete for NPPP. The latter contrasts results of Papadimitriou and Tsitsiklis Papadimitriou and Tsitsiklis 1987, who showed that this problem is PSPACE-complete, if no assumptions about the representability of the policy are made, and that it is P-complete for fully-observable processes.

MARTIN MUNDHENK

[1] Jacobo Torán,et al. Complexity classes defined by counting quantifiers , 1991, JACM.

[2] John T. Gill,et al. Computational complexity of probabilistic Turing machines , 1974, STOC '74.

[3] Eric Allender,et al. The Complexity of Policy Evaluation for Finite-Horizon Partially-Observable Markov Decision Processes , 1997, MFCS.

[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5] Michael L. Littman,et al. The Computational Complexity of Probabilistic Planning , 1998, J. Artif. Intell. Res..

[6] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[7] Richard E. Ladner. Polynomial Space Counting Problems , 1989, SIAM J. Comput..

[8] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[9] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .