Swarm learning in restricted environments: an examination of semi-stochastic action selection

This paper explores a machine learning process for robotic swarms tasked with a non-trivial problem in restricted environments. The effect of using a semi-stochastic action selector in a learning classifier based behaviour system is examined via adjusting the stochasticity setting. In this study we utilise Greedy Randomised Adaptive Search Procedures, finding some improvement in the ability of the swarm in non-deterministic, partially observable environments, compared to Greedy selection. We also find the swarm performs significantly worse when machine learning is removed. This study also explores an evolutionary process used to optimise the behaviours available to each agent. This evolutionary process is examined in regard to the effect it has on the learning settings. It is found that the evolution reduces the impact of fine-tuning the learning variables. However, fully stochastic selection prevents learning, which impairs the evolution.

[1]  Farzaneh Abdollahi,et al.  A Decentralized Cooperative Control Scheme With Obstacle Avoidance for a Team of Mobile Robots , 2014, IEEE Transactions on Industrial Electronics.

[2]  Gabriel A. Wainer,et al.  SIMULATING THE EFFECT OF DEGRADED WIRELESS COMMUNICATIONS ON EMERGENT BEHAVIOR , 2017 .

[3]  Dario Floreano,et al.  Evolved swarming without positioning information: an application in aerial communication relay , 2009, Auton. Robots.

[4]  Mohan Yogeswaran A study on foraging behavior of swarm robots using reinforcement learning techniques , 2017 .

[5]  Marco Dorigo,et al.  Bio-inspired construction with mobile robots and compliant pockets , 2015 .

[6]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[7]  Kunikazu Kobayashi,et al.  A reinforcement learning system for swarm behaviors , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[8]  John J. Grefenstette,et al.  Explanations of Empirically Derived Reactive Plans , 1990, ML.

[9]  Sandip Sen,et al.  Multiagent Coordination with Learning Classifier Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[10]  Jan Carlo Barca,et al.  Adaptive data transfer methods via policy evolution for UAV swarms , 2017, 2017 27th International Telecommunication Networks and Applications Conference (ITNAC).

[11]  John H. Holland,et al.  Cognitive systems based on adaptive algorithms , 1977, SGAR.

[12]  Adam Lipowski,et al.  Roulette-wheel selection via stochastic acceptance , 2011, ArXiv.

[13]  Gaurav S. Sukhatme,et al.  Mobile Sensor Network Deployment using Potential Fields : A Distributed , Scalable Solution to the Area Coverage Problem , 2002 .

[14]  A. E. Eiben,et al.  Three-fold Adaptivity in Groups of Robots: The Effect of Social Learning , 2015, GECCO.

[15]  Theodore S. Rappaport,et al.  Wireless communications - principles and practice , 1996 .

[16]  Jaime Lloret Mauri,et al.  Improved Geographical Routing in Vehicular Ad Hoc Networks , 2015, Wirel. Pers. Commun..

[17]  J. Grefenstette Credit Assignment in Rule Discovery Systems Based on Genetic Algorithms , 2005, Machine Learning.

[18]  Claudia Szabo,et al.  Simulating the effect of degraded wireless communications on emergent behavior , 2017, 2017 Winter Simulation Conference (WSC).

[19]  Michael O'Neill,et al.  Grammatical evolution - evolutionary automatic programming in an arbitrary language , 2003, Genetic programming.

[20]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[21]  A. E. Eiben,et al.  On-line Evolution of Foraging Behaviour in a Population of Real Robots , 2016, EvoApplications.

[22]  Ramón Alvarez-Valdés,et al.  Reactive GRASP for the strip-packing problem , 2008, Comput. Oper. Res..

[23]  Tariq Samad,et al.  Network-Centric Systems for Military Operations in Urban Terrain: The Role of UAVs , 2007, Proceedings of the IEEE.

[24]  Gang Chen,et al.  Using Learning Classifier Systems to Learn Stochastic Decision Policies , 2015, IEEE Transactions on Evolutionary Computation.

[25]  Kristofer S. J. Pister,et al.  RF Time of Flight Ranging for Wireless Sensor Network Localization , 2006, 2006 International Workshop on Intelligent Solutions in Embedded Systems.

[26]  Yilong Lu,et al.  Angle-of-arrival estimation for localization and communication in wireless networks , 2008, 2008 16th European Signal Processing Conference.

[27]  Daniel B. Faria Modeling Signal Attenuation in IEEE 802.11 Wireless LANs-Vol. 1 , 2022 .

[28]  Nicole Mideo,et al.  Parasite transmission among relatives halts Red Queen dynamics , 2017, Evolution; international journal of organic evolution.

[29]  Michael K. Sahota Reactive Deliberation: An Architecture for Real-Time Intelligent Control in Dynamic Environments , 1994, AAAI.

[30]  Holger Claussen,et al.  Multilayer Optimization of Heterogeneous Networks Using Grammatical Genetic Programming , 2017, IEEE Transactions on Cybernetics.

[31]  Celso C. Ribeiro,et al.  Optimization by GRASP: Greedy Randomized Adaptive Search Procedures , 2016 .

[32]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[33]  Larry Bull,et al.  TCS Learning Classifier System Controller on a Real Robot , 2002, PPSN.