Collective Intelligence, Data Routing and Braess' Paradox

We consider the problem of designing the the utility functions of the utility-maximizing agents in a multi-agent system (MAS) so that they work synergistically to maximize a global utility. The particular problem domain we explore is the control of network routing by placing agents on all the routers in the network. Conventional approaches to this task have the agents all use the Ideal Shortest Path routing Algorithm (ISPA). We demonstrate that in many cases, due to the side-effects of one agent's actions on another agent's performance, having agents use ISPA's is suboptimal as far as global aggregate cost is concerned, even when they are only used to route infinitesimally small amounts of traffic. The utility functions of the individual agents are not "aligned" with the global utility, intuitively speaking. As a particular example of this we present an instance of Braess' paradox in which adding new links to a network whose agents all use the ISPA results in a decrease in overall throughput. We also demonstrate that load-balancing, in which the agents' decisions are collectively made to optimize the global cost incurred by all traffic currently being routed, is suboptimal as far as global cost averaged across time is concerned. This is also due to "side-effects", in this case of current routing decision on future traffic. The mathematics of Collective Intelligence (COIN) is concerned precisely with the issue of avoiding such deleterious side-effects in multi-agent systems, both over time and space. We present key concepts from that mathematics and use them to derive an algorithm whose ideal version should have better performance than that of having all agents use the ISPA, even in the infinitesimal limit. We present experiments verifying this, and also showing that a machine-learning-based version of this COIN algorithm in which costs are only imprecisely estimated via empirical means (a version potentially applicable in the real world) also outperforms the ISPA, despite having access to less information than does the ISPA. In particular, this COIN algorithm almost always avoids Braess' paradox.

[1]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[2]  Ariel Orda,et al.  Capacity allocation under noncooperative routing , 1997, IEEE Trans. Autom. Control..

[3]  E. Bonabeau,et al.  Routing in Telecommunications Networks with “ Smart ” Ant-Like Agents , 1998 .

[4]  Bernardo A. Huberman,et al.  Dynamics with expectations , 1992 .

[5]  John N. Tsitsiklis,et al.  Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks , 1997, NIPS.

[6]  A. Mowbray Road to ruin , 1969 .

[7]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8]  Joel E. Cohen,et al.  A paradox of congestion in a queuing network , 1990, Journal of Applied Probability.

[9]  Dit-Yan Yeung,et al.  Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control , 1995, NIPS.

[10]  Onn Shehory,et al.  Anytime Coalition Structure Generation with Worst Case Guarantees , 1998, AAAI/IAAI.

[11]  B. Crowe The tragedy of the commons revisite. , 1969, Science.

[12]  Kagan Tumer,et al.  Reinforcement Learning in Distributed Domains: Beyond Team Games , 2001, IJCAI.

[13]  Moshe Tennenholtz,et al.  Adaptive Load Balancing: A Study in Multi-Agent Learning , 1994, J. Artif. Intell. Res..

[14]  B. Huberman,et al.  Social Dilemmas and Internet Congestions , 1997 .

[15]  Ariel Orda,et al.  Avoiding the Braess paradox in non-cooperative networks , 1999, Journal of Applied Probability.

[16]  Tad Hogg,et al.  Social Dilemmas in Computational Ecosystems , 1995, IJCAI.

[17]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[18]  Gerhard Weiss,et al.  Multiagent Systems , 1999 .

[19]  Alan H. Bond,et al.  Distributed Artificial Intelligence , 1988 .

[20]  J. Warner The road to ruin. , 1996, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[21]  Kagan Tumer,et al.  Learning sequences of actions in collectives of autonomous agents , 2002, AAMAS '02.

[22]  Kagan Tumer,et al.  An Introduction to Collective Intelligence , 1999, ArXiv.

[23]  Shailesh Kumar and Risto Miikkulainen Dual Reinforcement Q-Routing: An On-Line Adaptive Routing Algorithm , 1997 .

[24]  Ariel Orda,et al.  Minimum delay routing in stochastic networks , 1993, TNET.

[25]  Narsingh Deo,et al.  Shortest-path algorithms: Taxonomy and annotation , 1984, Networks.

[26]  Dimitri P. Bertsekas,et al.  Data Networks , 1986 .

[27]  Martin Heusse,et al.  Adaptive Agent-Driven Routing and Load Balancing in Communication Networks , 1998, Adv. Complex Syst..

[28]  Michael P. Wellman,et al.  Online learning about other agents in a dynamic multiagent system , 1998, AGENTS '98.

[29]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[30]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[31]  Peter Stone TPOT-RL Applied to Network Routing , 2000, ICML.

[32]  Kagan Tumer,et al.  Using Collective Intelligence to Route Internet Traffic , 1998, NIPS.

[33]  Devika Subramanian,et al.  Ants and Reinforcement Learning: A Case Study in Routing in Dynamic Networks , 1997, IJCAI.

[34]  Clark Jeffries,et al.  Congestion resulting from increased capacity in single-server queueing networks , 1997, TNET.

[35]  Kagan Tumer,et al.  Collective Intelligence and Braess' Paradox , 2000, AAAI/IAAI.

[36]  Scott Shenker,et al.  Making greed work in networks: a game-theoretic analysis of switch service disciplines , 1995, TNET.

[37]  Tad Hogg,et al.  Dilemmas in Computational Societies , 1995, ICMAS.

[38]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[39]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[40]  Nicholas R. Jennings,et al.  A Roadmap of Agent Research and Development , 2004, Autonomous Agents and Multi-Agent Systems.

[41]  Kagan Tumer,et al.  Collective Intelligence for Control of Distributed Dynamical Systems , 1999, ArXiv.

[42]  Guy Theraulaz,et al.  Adaptive Task Allocation Inspired by a Model of Division of Labor in Social Insects , 1997, BCEC.

[43]  Michael L. Littman,et al.  A Distributed Reinforcement Learning Scheme for Network Routing , 1993 .

[44]  Ariel Orda,et al.  Competitive routing in multiuser communication networks , 1993, TNET.

[45]  Ariel Orda,et al.  Achieving network optima using Stackelberg routing strategies , 1997, TNET.

[46]  Tad Hogg,et al.  The Emergence of Computational Ecologies , 1993 .

[47]  Kagan Tumer,et al.  Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[48]  Scott Shenker,et al.  Making greed work in networks: a game-theoretic analysis of switch service disciplines , 1994 .

[49]  Ariel Orda,et al.  Architecting noncooperative networks , 1995, Eighteenth Convention of Electrical and Electronics Engineers in Israel.

[50]  Kagan Tumer,et al.  Adaptivity in agent-based routing for data networks , 1999, AGENTS '00.