Amoeba-inspired Tug-of-War algorithms for exploration-exploitation dilemma in extended Bandit Problem

The true slime mold Physarum polycephalum, a single-celled amoeboid organism, is capable of efficiently allocating a constant amount of intracellular resource to its pseudopod-like branches that best fit the environment where dynamic light stimuli are applied. Inspired by the resource allocation process, the authors formulated a concurrent search algorithm, called the Tug-of-War (TOW) model, for maximizing the profit in the multi-armed Bandit Problem (BP). A player (gambler) of the BP should decide as quickly and accurately as possible which slot machine to invest in out of the N machines and faces an "exploration-exploitation dilemma." The dilemma is a trade-off between the speed and accuracy of the decision making that are conflicted objectives. The TOW model maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a nonlocal correlation among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). Owing to this nonlocal correlation, the TOW model can efficiently manage the dilemma. In this study, we extend the TOW model to apply it to a stretched variant of BP, the Extended Bandit Problem (EBP), which is a problem of selecting the best M-tuple of the N machines. We demonstrate that the extended TOW model exhibits better performances for 2-tuple-3-machine and 2-tuple-4-machine instances of EBP compared with the extended versions of well-known algorithms for BP, the ϵ-Greedy and SoftMax algorithms, particularly in terms of its short-term decision-making capability that is essential for the survival of the amoeba in a hostile environment.

[1]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[2]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[3]  Masashi Aono,et al.  Beyond input-output computings: error-driven emergence with parallel non-distributed slime mold computer. , 2003, Bio Systems.

[4]  Tatsuji Takahashi,et al.  COGNITIVE SYMMETRY: ILLOGICAL BUT RATIONAL BIASES , 2009 .

[5]  Kouichi Katsurada,et al.  A Model of Belief Formation Based on Causality and Application to N-armed Bandit Problem , 2007 .

[6]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[7]  Song-Ju Kim,et al.  Amoeba-based computing for traveling salesman problem: Long-term correlations between spatially separated individual cells of Physarum polycephalum , 2013, Biosyst..

[8]  Song-Ju Kim,et al.  Amoeba-inspired algorithm for cognitive medium access , 2014 .

[9]  Kazuyuki Aihara,et al.  Resource-Competing Oscillator Network as a Model of Amoeba-Based Neurocomputer , 2009, UC.

[10]  J. Hopfield,et al.  Computing with neural circuits: a model. , 1986, Science.

[11]  Atsuko Takamatsu,et al.  Spontaneous switching among multiple spatio-temporal patterns in three-oscillator systems constructed with oscillatory cells of true slime mold , 2006 .

[12]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[13]  Kazuyuki Aihara,et al.  Greedy versus social: resource-competing oscillator network as a model of amoeba-based neurocomputer , 2011, Natural Computing.

[14]  Mehryar Mohri,et al.  Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.

[15]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16]  Motoichi Ohtsu,et al.  Decision Maker based on Nanoscale Photo-excitation Transfer , 2013, Scientific reports.

[17]  Song-Ju Kim,et al.  Tug-of-war model for the two-bandit problem: Nonlocally-correlated parallel exploration via resource conservation , 2010, Biosyst..

[18]  Kazuyuki Aihara,et al.  Amoeba-based neurocomputing with chaotic dynamics , 2007, CACM.

[19]  Kazuyuki Aihara,et al.  Amoeba-based Chaotic Neurocomputing: Combinatorial Optimization by Coupled Biological Oscillators , 2009, New Generation Computing.

[20]  K. Aihara,et al.  Spontaneous mode switching in coupled oscillators competing for constant amounts of resources. , 2010, Chaos.

[21]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[22]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[23]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[24]  Toshiyuki Nakagaki,et al.  Physarum solver: A biologically inspired method of road-network navigation , 2006 .

[25]  D. Kessler,et al.  CHAPTER 5 – Plasmodial Structure and Motility , 1982 .

[26]  Song-Ju Kim,et al.  Tug-Of-War Model for Two-Bandit Problem , 2009, UC.

[27]  Song-Ju Kim,et al.  Tug-of-War Model for Multi-armed Bandit Problem , 2010, UC.

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Masashi Aono,et al.  Spontaneous deadlock breaking on amoeba-based neurocomputer , 2008, Biosyst..

[30]  A. Tero,et al.  Rules for Biologically Inspired Adaptive Network Design , 2010, Science.

[31]  T. Nakagaki,et al.  Intelligence: Maze-solving by an amoeboid organism , 2000, Nature.