RMRL: improved regret minimisation techniques using learning automata

ABSTRACT Game theory as one of the most progressive areas in AI in last few years originates from the same root as AI. The unawareness of the other players and their decisions in such incomplete-information problems, make it necessary to use some learning techniques to enhance the decision-making process. Reinforcement learning techniques are studied in this research; regret minimisation (RM) and utility maximisation (UM) techniques as reinforcement learning approaches are widely applied to such scenarios to achieve optimum solutions. In spite of UM, RM techniques enable agents to overcome the shortage of information and enhance the performance of their choices based on regrets, instead of utilities. The idea of merging these two techniques are motivated by iteratively applying UM functions to RM techniques. The main contributions are as follows; first, proposing some novel updating methods based on UM of reinforcement learning approaches for RM; the proposed methods refine RM to accelerate the regret reduction, second, devising different procedures, all relying on RM techniques, in a multi-state predator-prey problem. Third, how the approach, called RMRL, enhances different RM techniques in this problem is studied. Estimated results support the validity of RMRL approach comparing with some UM and RM techniques.

[1]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2]  Kumpati S. Narendra,et al.  Learning automata approach to hierarchical multiobjective analysis , 1991, IEEE Trans. Syst. Man Cybern..

[3]  Reinhard Laubenbacher,et al.  Optimal Harvesting for a Predator-Prey Agent-Based Model using Difference Equations , 2015, Bulletin of Mathematical Biology.

[4]  Frank Pettersson,et al.  A genetic algorithms based multi-objective neural net applied to noisy blast furnace data , 2007, Appl. Soft Comput..

[5]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[6]  Bikramjit Banerjee,et al.  Strategic best-response learning in multiagent systems , 2012, J. Exp. Theor. Artif. Intell..

[7]  David A. Hensher,et al.  Random Regret Minimization and Random Utility Maximization in the Presence of Preference Heterogeneity: An Empirical Contrast , 2016 .

[8]  Carlos H. C. Ribeiro,et al.  Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems , 2014, Applied Intelligence.

[9]  Kaddour Najim,et al.  Learning automata and stochastic optimization , 1997 .

[10]  Abbas Ahmadi,et al.  Multi-objective optimization of radiotherapy: distributed Q-learning and agent-based simulation , 2017, J. Exp. Theor. Artif. Intell..

[11]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[12]  Alan A. Berryman,et al.  The Orgins and Evolution of Predator‐Prey Theory , 1992 .

[13]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[14]  Roger B. Myerson,et al.  Game theory - Analysis of Conflict , 1991 .

[15]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[16]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[17]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[18]  John C. Harsanyi,et al.  Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[19]  George S. Dulikravich,et al.  Modified predator-prey algorithm for constrained and unconstrained multi-objective optimisation , 2009, Int. J. Math. Model. Numer. Optimisation.