论文信息 - Distributed Adaptive Control: Beyond Single-Instant, Discrete Control Variables

Distributed Adaptive Control: Beyond Single-Instant, Discrete Control Variables

In extensive form noncooperative game theory, at each instant t, each agent i sets its state x i independently of the other agents, by sampling an associated distribution, q i(x i). The coupling between the agents arises in the joint evolution of those distributions. Distributed control problems can be cast the same way. In those problems the system designer sets aspects of the joint evolution of the distributions to try to optimize the goal for the overall system. Now information theory tells us what the separate q i of the agents are most likely to be if the system were to have a particular expected value of the objective function G(x 1, x 2, ...). So one can view the job of the system designer as speeding an iterative process. Each step of that process starts with a specified value of E(G), and the convergence of the q i to the most likely set of distributions consistent with that value. After this the target value for E q(G) is lowered, and then the process repeats. Previous work has elaborated many schemes for implementing this process when the underlying variables x i all have a finite number of possible values and G does not extend to multiple instants in time. That work also is based on a fixed mapping from agents to control devices, so that the the statistical independence of the agents’ moves means independence of the device states. This paper also extends that work to relax all of these restrictions. This extends the applicability of that work to include continuous spaces and Reinforcement Learning. This paper also elaborates how some of that earlier work can be viewed as a first-principles justification of evolution-based search algorithms.

David H. Wolpert | Stefan Bieniawski | Stefan R. Bieniawski | D. Wolpert

[1] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[2] T. Başar,et al. Dynamic Noncooperative Game Theory , 1982 .

[3] Stefan R. Bieniawski,et al. Adaptive Multi-Agent Systems for Constrained Optimization , 2004 .

[4] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[5] David H. Wolpert,et al. Distributed control by Lagrangian steepest descent , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[6] Richard D. Braatz,et al. Robust performance of cross-directional basis-weight control in paper machines , 1993, Autom..

[7] David H. Wolpert,et al. Product distribution theory for control of multi-agent systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[8] O. Catoni. Solving Scheduling Problems by Simulated Annealing , 1998 .

[9] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[10] D. Wolpert,et al. Product Distribution Theory and Semi-Coordinate Transformations , 2004 .

[11] Jason L. Speyer,et al. Decentralized controllers for unmanned aerial vehicle formation flight , 1996 .

[12] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[13] E. Jaynes. Probability theory : the logic of science , 2003 .

[14] Dimitri P. Bertsekas,et al. Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[15] David H. Wolpert,et al. Adaptive, distributed control of constrained multi-agent systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[16] David H. Wolpert,et al. Product Distributions for Distributed Optimization. Chapter 1 , 2004 .

[17] Ilan Kroo,et al. Fleet Assignment Using Collective Intelligence , 2004 .

[18] David H. Wolpert,et al. What Information Theory Says About Best Response and About Binding Contracts , 2004 .

[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20] Reinhard Lüling,et al. Problem Independent Distributed Simulated Annealing and its Applications , 1993 .

[21] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[22] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[23] R. Vidal. Applied simulated annealing , 1993 .

[24] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[25] M. Mesbahi,et al. Graphs, matrix inequalities, and switching for the formation flying control of multiple spacecraft , 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251).

[26] David H. Wolpert,et al. Discrete, Continuous, and Constrained Optimization Using Collectives , 2004 .

[27] David H. Wolpert,et al. Information Theory - The Bridge Connecting Bounded Rational Game Theory and Statistical Physics , 2004, ArXiv.

[28] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .