A Survey of Collective Intelligence

This chapter presents the science of "COllective INtelligence" (COIN). A COIN is a large multi-agent systems where: i) the agents each run reinforcement learning (RL) algorithms; ii) there is little to no centralized communication or control; iii) there is a provided world utility function that, rates the possible histories of tile full system. Tile conventional approach to designing large distributed systems to optimize a world utility does not use agents running RL algorithms. Rather that approach begins with explicit modeling of the overall system's dynamics, followed by detailed hand-tuning of the interactions between the components to ensure that they "cooperate" as far as the world utility is concerned. This approach is labor-intensive, often results in highly non-robust systems, and usually results in design techniques that, have limited applicability. In contrast, with COINs we wish to solve the system design problems implicitly, via the 'adaptive' character of the RL algorithms of each of the agents. This COIN approach introduces an entirely new, profound design problem: Assuming the RL algorithms are able to achieve high rewards, what reward functions for the individual agents will, when pursued by those agents, result in high world utility? In other words, what reward functions will best ensure that we do not have phenomena like the tragedy of the commons, or Braess's paradox? Although still very young, the science of COINs has already resulted in successes in artificial domains, in particular in packet-routing, the leader-follower problem, and in variants of Arthur's "El Farol bar problem". It is expected that as it matures not only will COIN science expand greatly the range of tasks addressable by human engineers, but it will also provide much insight into already established scientific fields, such as economics, game theory, or population biology.

[1]  Ijaz Ahmed Physics Review Letters , 2014 .

[2]  M. Jackson,et al.  Games on Networks , 2014 .

[3]  D. H. Zanette,et al.  Coherence and clustering in ensembles of neural networks , 1997, adap-org/9707005.

[4]  A. Pentland,et al.  Collective intelligence , 2006, IEEE Comput. Intell. Mag..

[5]  Melanie Mitchell,et al.  Computation in Cellular Automata: A Selected Review , 2005, Non-standard Computation.

[6]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[7]  V. Borkar,et al.  Collective Behaviour and Diversity in Economic Communities: Some Insights from an Evolutionary Game , 1998, adap-org/9804003.

[8]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[9]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[10]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[11]  Nicholas R. Jennings,et al.  A Roadmap of Agent Research and Development , 2004, Autonomous Agents and Multi-Agent Systems.

[12]  Radhika Nagpal,et al.  Programming Biological Cells , 2000 .

[13]  A. Cavagna Irrelevance of memory in the minority game , 1998, cond-mat/9812215.

[14]  M. Shubik,et al.  Dynamics of money. , 1998, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[15]  Gerhard Weiss,et al.  Multiagent Systems , 1999 .

[16]  M. Stinchcombe,et al.  Markets, information, and uncertainty: Exchange in a network of trading posts , 1999 .

[17]  D. Helbing,et al.  Jams, Waves, and Clusters , 1998, Science.

[18]  L. Gabora AUTOCATALYTIC CLOSURE IN A COGNITIVE SYSTEM: A TENTATIVE SCENARIO FOR THE ORIGIN OF CULTURE , 1998, adap-org/9901002.

[19]  Timothy X. Brown,et al.  Optimizing Admission Control while Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning , 1998, NIPS.

[20]  Kagan Tumer,et al.  Using Collective Intelligence to Route Internet Traffic , 1998, NIPS.

[21]  Harold Abelson,et al.  Amorphous-computing techniques may lead to intelligent materials , 1998 .

[22]  S. Sinha,et al.  Adaptive control of spatially extended systems: targeting spatiotemporal patterns and chaos , 1998 .

[23]  Jonathan Bendor,et al.  The evolutionary advantage of conditional cooperation , 1998, Complex..

[24]  G. Polis Ecology: Stability is woven by complex webs , 1998, Nature.

[25]  D. Helbing,et al.  Phase diagram of tra c states in the presence of inhomogeneities , 1998, cond-mat/9809324.

[26]  A. Engel,et al.  Matrix Games, Mixed Strategies, and Statistical Mechanics , 1998, cond-mat/9809265.

[27]  A. Lazar,et al.  Design, Analysis and Simulation of the Progressive Second Price Auction for Network Bandwidth Sharing , 1998 .

[28]  Yun Peng,et al.  Computational Models for the Formation of Protocell Structures , 1998, Artificial Life.

[29]  T. Lenton Gaia and natural selection , 1998, Nature.

[30]  H. Kern Reeve,et al.  Familiarity breeds cooperation , 1998, Nature.

[31]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[32]  Martin Andersson,et al.  Leveled Commitment Contracts with Myopic and Strategic Agents , 1998, AAAI/IAAI.

[33]  Pat Langley,et al.  Learning Cooperative Lane Selection Strategies for Highways , 1998, AAAI/IAAI.

[34]  Onn Shehory,et al.  Anytime Coalition Structure Generation with Worst Case Guarantees , 1998, AAAI/IAAI.

[35]  D. Meyer,et al.  Statistical Mechanics of Voting , 1998, cond-mat/9806359.

[36]  Boris S. Kerner,et al.  Local cluster effect in different traffic flow models , 1998 .

[37]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[38]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[39]  John H. Miller,et al.  Communication and cooperation , 1998 .

[40]  Gerald Jay Sussman,et al.  Cellular Gate Technology , 1998 .

[41]  Joshua M. Epstein,et al.  Zones of cooperation in demographic prisoner's dilemma , 1997, Complex..

[42]  Jeffrey O. Kephart,et al.  Price and Niche Wars in a Free-Market Economy of Software Agents , 1997, Artificial Life.

[43]  G. Szabó,et al.  Evolutionary prisoner's dilemma game on a square lattice , 1997, cond-mat/9710096.

[44]  E. B. Baum,et al.  Manifesto for an evolutionary economics of intelligence , 1998 .

[45]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[46]  Andrew M. Colman,et al.  The complexity of cooperation: Agent-based models of competition and collaboration , 1998, Complex..

[47]  Vijay Krishna,et al.  Efficient Mechanism Design , 1998 .

[48]  Richard J. La,et al.  Optimal routing control: game theoretic approach , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[49]  I. Hanski Ecology: Be diverse, be predictable , 1997, Nature.

[50]  Christopher G. Atkeson,et al.  Nonparametric Model-Based Reinforcement Learning , 1997, NIPS.

[51]  John N. Tsitsiklis,et al.  Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks , 1997, NIPS.

[52]  M. Marsili,et al.  A Prototype Model of Stock Exchange , 1997, cond-mat/9709118.

[53]  Devika Subramanian,et al.  Ants and Reinforcement Learning: A Case Study in Routing in Dynamic Networks , 1997, IJCAI.

[54]  Roderic A. Grupen,et al.  Learning to Coordinate Controllers - Reinforcement Learning on a Control Basis , 1997, IJCAI.

[55]  Yicheng Zhang,et al.  Emergence of cooperation and organization in an evolutionary game , 1997, adap-org/9708006.

[56]  Sarit Kraus,et al.  Negotiation and Cooperation in Multi-Agent Environments , 1997, Artif. Intell..

[57]  Michael Batty,et al.  Predicting where we walk , 1997, Nature.

[58]  Y. Ermoliev,et al.  Learning in Potential Games , 1997 .

[59]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[60]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[61]  Janet Bruten,et al.  Ant-like agents for load balancing in telecommunications networks , 1997, AGENTS '97.

[62]  Joshua M. Epstein,et al.  Nonlinear Dynamics, Mathematical Biology, And Social Science , 1997 .

[63]  Tad Hogg,et al.  Learning in Multiagent Control of Smart Matter , 1997 .

[64]  Guy Theraulaz,et al.  Adaptive Task Allocation Inspired by a Model of Division of Labor in Social Insects , 1997, BCEC.

[65]  Takashi Ikegami,et al.  Emergence of Collective Strategies in a Prey-Predator Game Model , 1997, Artificial Life.

[66]  Jean C. Walrand,et al.  High-performance communication networks , 1999 .

[67]  A. Hastings Population Biology: Concepts and Models , 1996 .

[68]  Shlomo Zilberstein,et al.  Reinforcement Learning for Mixed Open-loop and Closed-loop Control , 1996, NIPS.

[69]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[70]  B. Huberman,et al.  Controlling smart matter , 1996, cond-mat/9611024.

[71]  John Cheng,et al.  The Mixed Strategy Equilibria and Adaptive Dynamics in the Bar Problem , 1996 .

[72]  Eric B. Baum,et al.  Toward a Model of Mind as a Laissez-Faire Economy of Idiots , 1996, ICML.

[73]  Jacques Ferber,et al.  Reactive distributed artificial intelligence: principles and applications , 1996 .

[74]  Kerner,et al.  Experimental properties of complexity in traffic flow. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[75]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[76]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[77]  J. Stein Critical properties of a spin glass with anisotropic Dzyaloshinskii - Moriya interaction , 1996 .

[78]  Michael Stonebraker,et al.  Data replication in Mariposa , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[79]  Qu,et al.  Spatiotemporal on-off intermittency by random driving. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[80]  J. Warner The road to ruin. , 1996, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[81]  John H. Miller,et al.  The coevolution of automata in the repeated Prisoner's Dilemma , 1996 .

[82]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[83]  Arthur De Vany,et al.  The Emergence and Evolution of Self-Organized Coalitions , 1996 .

[84]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[85]  Christini,et al.  Using noise and chaos control to control nonchaotic systems. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[86]  Scott Shenker,et al.  Making greed work in networks: a game-theoretic analysis of switch service disciplines , 1995, TNET.

[87]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[88]  Dit-Yan Yeung,et al.  Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control , 1995, NIPS.

[89]  Jeffrey K. MacKie-Mason,et al.  Pricing Congestible Network Resources (Invited Paper) , 1995, IEEE J. Sel. Areas Commun..

[90]  John H. Miller,et al.  Evolving Information Processing Organizations , 1995 .

[91]  Nakayama,et al.  Dynamical model of traffic congestion and numerical simulation. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[92]  Nagel,et al.  Discrete stochastic models for traffic flow. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[93]  S. Kauffman At Home in the Universe: The Search for the Laws of Self-Organization and Complexity , 1995 .

[94]  Claudia V. Goldman,et al.  Emergent Coordination through the Use of Cooperative State-Changing Rules , 1994, AAAI.

[95]  Jackson,et al.  Simple model of self-organized biological evolution. , 1994, Physical review letters.

[96]  Melanie Mitchell,et al.  Evolving cellular automata to perform computations: mechanisms and impediments , 1994 .

[97]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[98]  Michael R. Genesereth,et al.  Software agents , 1994, CACM.

[99]  Alan S. Perelson,et al.  Self-nonself discrimination in a computer , 1994, Proceedings of 1994 IEEE Computer Society Symposium on Research in Security and Privacy.

[100]  Michael Stonebraker,et al.  Mariposa: a new architecture for distributed data , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[101]  D. Duboule Guidebook to the homeobox genes , 1994 .

[102]  W. Arthur Complexity in economic theory: inductive reasoning and bounded rationality , 1994 .

[103]  C. Lee Giles,et al.  An experimental comparison of recurrent neural networks , 1994, NIPS.

[104]  J. Crutchfield,et al.  Turbulent pattern bases for cellular automata , 1993 .

[105]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[106]  C. Kappen,et al.  Early evolutionary origin of major homeodomain sequence classes. , 1993, Genomics.

[107]  H. Young,et al.  The Evolution of Conventions , 1993 .

[108]  Middleton,et al.  Self-organization and a dynamical transition in traffic-flow models. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[109]  Carl A. Waldspurger,et al.  Enterprise: a Market-like Task Scheduler for Distributed Computing Environments. [2] A. Barak and A. Shiloh. a Distributed Load-balancing Policy for a Multicomputer. Software Practice Load Balancing for Massively-parallel Soft-real-time Systems. Knowledge Systems Lab- Iteration Step Figure 13: Fairn , 1991 .

[110]  Ken Binmore,et al.  Fun and games : a text on game theory , 1992 .

[111]  Rodney A. Brooks,et al.  Intelligence Without Reason , 1991, IJCAI.

[112]  Anastasios A. Economides,et al.  Multi-objective routing in integrated services networks: A game theory approach , 1991, IEEE INFCOM '91. The conference on Computer Communications. Tenth Annual Joint Comference of the IEEE Computer and Communications Societies Proceedings.

[113]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[114]  J. Holland,et al.  Artificial Adaptive Agents in Economic Theory , 1991 .

[115]  Pattie Maes,et al.  Designing autonomous agents: Theory and practice from biology to engineering and back , 1990, Robotics Auton. Syst..

[116]  Jonathan Bard,et al.  Morphogenesis : the cellular and molecular processes of developmental anatomy , 1990 .

[117]  Rahul Simha,et al.  A Microeconomic Approach to Optimal Resource Allocation in Distributed Computer Systems , 1989, IEEE Trans. Computers.

[118]  Donald F. Ferguson,et al.  An economy for flow control in computer networks , 1989, IEEE INFOCOM '89, Proceedings of the Eighth Annual Joint Conference of the IEEE Computer and Communications Societies.

[119]  M. Tabor Chaos and Integrability in Nonlinear Dynamics: An Introduction , 1989 .

[120]  Bernardo A. Huberman,et al.  The ecology of computation , 1988, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[121]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[122]  S. Sastry,et al.  Adaptive Control: Stability, Convergence and Robustness , 1989 .

[123]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[124]  Tang,et al.  Self-organized criticality. , 1988, Physical review. A, General physics.

[125]  Alan H. Bond,et al.  Distributed Artificial Intelligence , 1988 .

[126]  Stephen Grossberg,et al.  The ART of adaptive pattern recognition by a self-organizing neural network , 1988, Computer.

[127]  Aurel A. Lazar,et al.  Abstracts of Research ReportOptimal flow control of multi-class queueing networks with decentralized information☆ , 1987 .

[128]  R. Aumann Correlated Equilibrium as an Expression of Bayesian Rationality Author ( s ) , 1987 .

[129]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[130]  J. Hopfield,et al.  Collective Computation With Continuous Variables , 1986 .

[131]  Dimitri P. Bertsekas,et al.  Data Networks , 1986 .

[132]  A. Neyman Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .

[133]  William Stallings,et al.  Data and Computer Communications , 1985 .

[134]  S. Wolfram Statistical mechanics of cellular automata , 1983 .

[135]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[136]  S. Gould,et al.  Punctuated equilibria: the tempo and mode of evolution reconsidered , 1977, Paleobiology.

[137]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[138]  E. Weintraub,et al.  General equilibrium theory , 1974 .

[139]  A. Mowbray Road to ruin , 1969 .

[140]  G. Hardin,et al.  The Tragedy of the Commons , 1968, Green Planet Blues.

[141]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[142]  William Vickrey,et al.  Counterspeculation, Auctions, And Competitive Sealed Tenders , 1961 .

[143]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[144]  W. Lomer Solid State Physics , 1959, Nature.

[145]  K. Arrow,et al.  EXISTENCE OF AN EQUILIBRIUM FOR A COMPETITIVE ECONOMY , 1954 .

[146]  L. A. Pipes An Operational Analysis of Traffic Dynamics , 1953 .

[147]  A. M. Turing,et al.  The chemical basis of morphogenesis , 1952, Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences.

[148]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[149]  A. Church The calculi of lambda-conversion , 1941 .

[150]  Léon Walras Éléments d'économie politique pure , 1889 .