Cooperative Multi-Agent Reinforcement Learning for Multi-Component Robotic Systems: guidelines for future research

Reinforcement Learning (RL) as a paradigm aims to develop algorithms that allow to train an agent to optimally achieve a goal with minimal feedback information about the desired behavior, which is not precisely specified. Scalar rewards are returned to the agent as response to its actions endorsing or opposing them. RL algorithms have been successfully applied to robot control design. The extension of the RL paradigm to cope with the design of control systems for Multi-Component Robotic Systems (MCRS) poses new challenges, mainly related to coping with scaling up of complexity due to the exponential state space growth, coordination issues, and the propagation of rewards among agents. In this paper, we identify the main issues which offer opportunities to develop innovative solutions towards fully-scalable cooperative multi-agent systems.

[1]  Fengyu Zhou,et al.  Mobile Robot Path Planning Based on Q-ANN , 2007, 2007 IEEE International Conference on Automation and Logistics.

[2]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[3]  Ramón Moreno,et al.  Experiments on Robotic Multi-agent System for Hose Deployment and Transportation , 2010, PAAMS.

[4]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[5]  Jing Shen,et al.  Multi-Agent Hierarchical Reinforcement Learning by Integrating Options into MAXQ , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[6]  Sridhar Mahadevan,et al.  Decision-Theoretic Planning with Concurrent Temporally Extended Actions , 2001, UAI.

[7]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[8]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[9]  Carlos Sagüés,et al.  Distributed consensus algorithms for merging feature-based maps with limited communication , 2011, Robotics Auton. Syst..

[10]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[11]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[12]  Sridhar Mahadevan,et al.  Learning to communicate and act using hierarchical reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[13]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[14]  Jingyuan Zhang,et al.  Application of Artificial Neural Network Based on Q-learning for Mobile Robot Path Planning , 2006, 2006 IEEE International Conference on Information Acquisition.

[15]  Randal W. Beard,et al.  Distributed Consensus in Multi-vehicle Cooperative Control - Theory and Applications , 2007, Communications and Control Engineering.

[16]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[17]  Norihiko Ono,et al.  A Modular Approach to Multi-Agent Reinforcement Learning , 1996, ECAI Workshop LDAIS / ICMAS Workshop LIOME.

[18]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[19]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[20]  Richard J. Duro,et al.  On the potential contributions of hybrid intelligent approaches to Multicomponent Robotic System development , 2010, Inf. Sci..

[21]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22]  Robert Fitch,et al.  Structural Abstraction Experiments in Reinforcement Learning , 2005, Australian Conference on Artificial Intelligence.

[23]  Abdelhamid Mellouk,et al.  Advances in Reinforcement Learning , 2011 .

[24]  Manuela Veloso,et al.  Scalable Learning in Stochastic Games , 2002 .

[25]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[26]  Laurent Jeanpierre,et al.  Distributed value functions for multi-robot exploration , 2012, 2012 IEEE International Conference on Robotics and Automation.

[27]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[28]  Nikos A. Vlassis,et al.  Anytime algorithms for multiagent decision making using coordination graphs , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[29]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[30]  M. Asada,et al.  Modular Learning Systems for Soccer Robot , 2004 .

[31]  Sridhar Mahadevan,et al.  Robot Learning , 1993 .

[32]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[33]  Manuel Graña,et al.  Linked multi-component mobile robots: Modeling, simulation and control , 2010, Robotics Auton. Syst..

[34]  Mark Humphrys,et al.  Action Selection methods using Reinforcement Learning , 1996 .

[35]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[36]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[37]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[38]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[39]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[40]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[41]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[42]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[43]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[44]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[45]  Jonas Karlsson,et al.  Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging , 1993 .

[46]  Prasad Tadepalli,et al.  Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.

[47]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[48]  Yong Duan,et al.  Fuzzy reinforcement learning and its application in robot navigation , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[49]  Bingrong Hong,et al.  A Modular On-line Profit Sharing Approach in Multiagent Domains , 2008 .

[50]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[51]  Steffen Udluft,et al.  Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks , 2012, Neural Networks: Tricks of the Trade.

[52]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[53]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[54]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[55]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[56]  Andrew G. Barto,et al.  A causal approach to hierarchical decomposition of factored MDPs , 2005, ICML.

[57]  Karl Tuyls,et al.  An Overview of Cooperative and Competitive Multiagent Learning , 2005, LAMAS.

[58]  Von-Wun Soo,et al.  Subgoal Identification for Reinforcement Learning and Planning in Multiagent Problem Solving , 2007, MATES.

[59]  Mark Humphreys,et al.  Action selection methods using reinforcement learning , 1997 .

[60]  Nikos A. Vlassis,et al.  Sparse cooperative Q-learning , 2004, ICML.

[61]  Von-Wun Soo,et al.  AUTOMATIC COMPLEXITY REDUCTION IN REINFORCEMENT LEARNING , 2010, Comput. Intell..

[62]  Matthew E. Taylor,et al.  Abstraction and Generalization in Reinforcement Learning: A Summary and Framework , 2009, ALA.

[63]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[64]  Mitsuo Kawato,et al.  Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[65]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[66]  Thomas G. Dietterich An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.

[67]  Abhijit Gosavi,et al.  Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .

[68]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[69]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[70]  William D. Smart Explicit Manifold Representations for Value-Function Approximation in Reinforcement Learning , 2004, ISAIM.

[71]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[72]  Hamid R. Berenji,et al.  Fuzzy Reinforcement Learning and Dynamic Programming , 1993, Fuzzy Logic in Artificial Intelligence.

[73]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[74]  H. R. Berenji,et al.  Fuzzy Q-learning for generalization of reinforcement learning , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[75]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[76]  Francisco S. Melo,et al.  Coordinated learning in multiagent MDPs with infinite state-space , 2010, Autonomous Agents and Multi-Agent Systems.

[77]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[78]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[79]  Von-Wun Soo,et al.  Subgoal Identifications in Reinforcement Learning: A Survey , 2011 .