Monotonicity Properties for Two-Action Partially Observable Markov Decision Processes on Partially Ordered Spaces

Abstract This paper investigates monotonicity properties of optimal policies for two-action partially observable Markov decision processes when the underlying (core) state and observation spaces are partially ordered. Motivated by the desirable properties of the monotone likelihood ratio order in imperfect information settings, namely the preservation of belief ordering under conditioning on any new information, we propose a new stochastic order (a generalization of the monotone likelihood ratio order) that is appropriate for when the underlying space is partially ordered. The generalization is non-trivial, requiring one to impose additional conditions on the elements of the beliefs corresponding to incomparable pairs of states. The stricter conditions in the proposed stochastic order reflect a conservation of structure in the problem – the loss of structure from relaxing the total ordering of the state space to a partial order requires stronger conditions with respect to the ordering of beliefs. In addition to the proposed stochastic order, we introduce a class of matrices, termed generalized totally positive of order 2, that are sufficient for preserving the order. Our main result is a set of sufficient conditions that ensures existence of an optimal policy that is monotone on the belief space with respect to the proposed stochastic order.

[1]  Ulrich Rieder,et al.  Structural results for partially observed control models , 1991, ZOR Methods Model. Oper. Res..

[2]  C. White Monotone control laws for noisy, countable-state Markov chains , 1980 .

[3]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[4]  Michael Z. F. Li,et al.  Monotone optimal control for a class of Markov decision processes , 2012, Eur. J. Oper. Res..

[5]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[6]  S. Ross Quality Control under Markovian Deterioration , 1971 .

[7]  Andrew J. Schaefer,et al.  Alleviating the Patient's Price of Privacy Through a Partially Observable Waiting List , 2013, Manag. Sci..

[8]  Abraham Grosfeld-Nir,et al.  Control limits for two-state partially observable Markov decision processes , 2007, Eur. J. Oper. Res..

[9]  Donald B. Rosenfield,et al.  Markovian Deterioration with Uncertain Information , 1976, Oper. Res..

[10]  Donald B. Rosenfield,et al.  MARKOVIAN DETERIORATION WITH UNCERTAIN INFORMATION — A MORE GENERAL MODEL , 1976 .

[11]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[12]  Cyrus Derman,et al.  Replacement of periodically inspected equipment. (An optimal optional stopping rule) , 1960 .

[13]  Yu Ding,et al.  Optimal Maintenance Strategies for Wind Turbine Systems Under Stochastic Weather Conditions , 2010, IEEE Transactions on Reliability.

[14]  Demosthenis Teneketzis,et al.  A POMDP Approach to the Dynamic Defense of Large-Scale Cyber Networks , 2018, IEEE Transactions on Information Forensics and Security.

[15]  Vikram Krishnamurthy,et al.  Optimal Threshold Policies for Multivariate POMDPs in Radar Resource Management , 2009, IEEE Transactions on Signal Processing.

[16]  C. White Optimal control-limit strategies for a partially observed replacement problem† , 1979 .

[17]  Andrew J. Schaefer,et al.  Alleviating the Patient’S Price of Privacy Through a Partially Observable Waiting List , 2013 .

[18]  N. L. Lawrie,et al.  Comparison Methods for Queues and Other Stochastic Models , 1984 .

[19]  S. Karlin,et al.  Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions , 1980 .

[20]  Tatyana Chernonog,et al.  A two-state partially observable Markov decision process with three actions , 2016, Eur. J. Oper. Res..

[21]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[22]  R. Bellman Equipment Replacement Policy , 1955 .

[23]  William S. Lovejoy,et al.  Some Monotonicity Results for Partially Observed Markov Decision Processes , 1987, Oper. Res..

[24]  M. A. Girshick,et al.  A BAYES APPROACH TO A QUALITY CONTROL MODEL , 1952 .

[25]  Ari Arapostathis,et al.  On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes , 1991, Ann. Oper. Res..

[26]  Evan L. Porteus On the Optimality of Structured Policies in Countable Stage Decision Processes , 1975 .

[27]  S. Christian Albright,et al.  Structural Results for Partially Observable Markov Decision Processes , 1979, Oper. Res..

[28]  Donald M. Topkis,et al.  Minimizing a Submodular Function on a Lattice , 1978, Oper. Res..

[29]  Ward Whitt,et al.  Comparison methods for queues and other stochastic models , 1986 .

[30]  Lisa M. Maillart,et al.  Maintenance policies for systems with condition monitoring and obvious failures , 2006 .

[31]  T. Kamae,et al.  Stochastic Inequalities on Partially Ordered Spaces , 1977 .

[32]  W. Whitt A Note on the Influence of the Sample on the Posterior Distribution , 1979 .

[33]  Soroush Saghafian,et al.  Ambiguous Partially Observable Markov Decision Processes: Structural Results and Applications , 2018, J. Econ. Theory.