Identifying tractable decentralized control problems on the basis of information structure

Sequential decomposition of two general models of decentralized systems with non-classical information structures is presented. In model A, all agents have two observations at each step: a common observation that all agents observe and a private observation of their own. The control actions of each agent is based on all past common observations, the current private observation and the contents of its memory. At each step, each agent also updates the contents of its memory. A cost function, which depends on the state of the plant and the control actions of all agents, is given. The objective is to choose control and memory update functions for all agents to either minimize a total expected cost over a finite horizon or to minimize a discounted cost over an infinite horizon. In model B, the agents do not have any common observation, the rest is same as in model A. The key idea of our solution methodology is the following. From the point of view of a fictitious agent that observes all common observations, the system can be viewed as a centralized system with partial observations. This allows us to identify information states and obtain a sequential decomposition. When the system variables take values in finite sets, the optimality equations of the sequential decomposition are similar to those of partially observable Markov decision processes (POMDP) with finite state and action spaces. For such systems, we can use algorithms for POMDPs to compute optimal designs for models A and B.

[1]  H. Witsenhausen Separation of estimation and control for discrete time systems , 1971 .

[2]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[3]  H. Witsenhausen ON SEQUENCES OF PAIRS OF DEPENDENT RANDOM VARIABLES , 1975 .

[4]  J. Walrand,et al.  On delayed sharing patterns , 1978 .

[5]  Jean Walrand,et al.  Causal coding and control for Markov chains , 1983 .

[6]  Jean C. Walrand,et al.  Optimal causal coding - decoding problems , 1983, IEEE Trans. Inf. Theory.

[7]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[8]  M. Aicardi,et al.  Decentralized optimal control of Markov chains with a common past information set , 1987 .

[9]  John Rust Using Randomization to Break the Curse of Dimensionality , 1997 .

[10]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[11]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[12]  Linda J. Young,et al.  Sequential Hypothesis Testing , 1998 .

[13]  D. Teneketzis,et al.  Optimal Performance of Feedback Control Systems with Limited Communication over Noisy Channels , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[14]  R. Aumann Agreeing to disagree. , 1976, Nature cell biology.

[15]  Demosthenis Teneketzis,et al.  Sequential decomposition of sequential dynamic teams: applications to real-time communication and networked control systems , 2008 .

[16]  Demosthenis Teneketzis,et al.  On Globally Optimal Encoding, Decoding, and Memory Update for Noisy Real-Time Communication Systems , 2008 .