A Framework for Dynamic Decision Making by Multi-agent Cooperative Fault Pair Algorithm (MCFPA) in Retail Shop Application

The paper gives the novel framework for dynamic decision making in the retail shop application based on proposed improved Nash Q-learning by Fault Pair Algorithm. Accordingly, this approach presents three retailer shops in the retail market. Shops must support each other to gain maximum revenue from cooperative knowledge via learning their own policies. The suppliers are the intelligent agents to utilize the cooperative learning to train in the situation. Assuming significant theory on the shop’s storage plan, restock time, arrival process of the customers, the approach is formed as Markov decision process model that makes it feasible to develop the learning algorithms. The proposed algorithms obviously learn changing market situation. Moreover, the paper illustrates results of cooperative reinforcement learning algorithms using improved Nash Q-learning by Fault Pair Algorithm for three shop agents for the period of one-year sale duration. Results obtained by two approaches—Nash Q-learning and improved Nash Q-learning by Fault Pair—are compared. An agent keeps Q-functions containing joint actions and carries out modifications depending on Nash equilibrium performance for the present Q-values. Paper discovers that the agents are intended to attain a joint best possible path with Nash Q-learning. The performance of both agents enhanced after using Fault pair Nash Q-learning.

[1]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[2]  Parag Kulkarni,et al.  Multi-agent Cooperation Models by Reinforcement Learning (MCMRL) , 2017 .

[3]  Maria L. Gini,et al.  Fast adaptive learning in repeated stochastic games by game abstraction , 2014, AAMAS.

[4]  Parag Kulkarni,et al.  New Approach for Advanced Cooperative Learning Algorithms using RL Methods (ACLA) , 2016 .

[5]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[6]  Parag Kulkarni,et al.  Innovative Approach Towards Cooperation Models for Multi-agent Reinforcement Learning (CMMARL) , 2016 .

[7]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[8]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[9]  Keiki Takadama,et al.  Designing Internal Reward of Reinforcement Learning Agents in Multi-Step Dilemma Problem , 2013, J. Adv. Comput. Intell. Intell. Informatics.

[10]  Mohamed S. Kamel,et al.  Aggregation of Reinforcement Learning Algorithms , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[11]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[12]  Janyl Jumadinova,et al.  A multi-agent system with reinforcement learning agents for biomedical text mining , 2015, BCB.

[13]  Parag Kulkarni,et al.  A Novel Approach to Association Rule Mining Using Multilevel Relationship Algorithm for Cooperative Learning , 2014 .

[14]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[15]  Parag Kulkarni,et al.  Intelligent Traffic Control by Multi-agent Cooperative Q Learning (MCQL) , 2018 .

[16]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[17]  Jong-Hwan Kim,et al.  Multi-Agent Systems: A Survey from the Robot-Soccer Perspective , 2000, Intell. Autom. Soft Comput..

[18]  Venkata L. Raju Chinthalapati,et al.  Learning dynamic prices in MultiSeller electronic retail markets with price sensitive customers, stochastic demands, and inventory replenishments , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Majid Nili Ahmadabadi,et al.  A Study on Expertise of Agents and Its Effects on Cooperative $Q$-Learning , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Tom Lenaerts,et al.  A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[21]  D. Vidhate,et al.  Cooperative Machine Learning with Information Fusion for Dynamic Decision Making in Diagnostic Applications , 2012, 2012 International Conference on Advances in Mobile Network, Communication and Its Applications.

[22]  Ying Wang,et al.  Cooperative and intelligent control of multi-robot systems using machine learning , 2008 .

[23]  Jiachen Ma,et al.  Multiagent Reinforcement Learning with Regret Matching for Robot Soccer , 2013 .

[24]  Parag Kulkarni,et al.  Implementation of Multiagent Learning Algorithms for Improved Decision Making , 2016 .

[25]  Parag Kulkarni,et al.  Cooperative multi-agent reinforcement learning models (CMRLM) for intelligent traffic control , 2017, 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM).

[26]  Deepak A. Vidhate Single Agent Learning Algorithms for Decision making in Diagnostic Applications , 2016 .

[27]  Minoru Asada,et al.  Co-evolution for cooperative behavior acquisition in a multiple mobile robot environment , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[28]  Yasuaki Kuroe,et al.  Swarm reinforcement learning methods improving certainty of learning for a multi-robot formation problem , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[29]  Paul Keng-Chieh Wang Navigation strategies for multiple autonomous mobile robots moving in formation , 1991, J. Field Robotics.

[30]  Shalabh Bhatnagar,et al.  Multi-agent reinforcement learning for traffic signal control , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[31]  Parag Kulkarni,et al.  Performance enhancement of cooperative learning algorithms by improved decision making for context based application , 2016, 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT).

[32]  Yong Duan,et al.  A multi-agent reinforcement learning approach to robot soccer , 2012, Artificial Intelligence Review.

[33]  Mohammad Ali Abbasi,et al.  Reinforcement Distribution in a Team of Cooperative Q-learning Agents , 2008, 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing.

[34]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[35]  Hyo-Sung Ahn,et al.  A survey on multi-agent reinforcement learning: Coordination problems , 2010, Proceedings of 2010 IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications.

[36]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[37]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[38]  Alessandro Lazaric,et al.  Learning to cooperate in multi-agent social dilemmas , 2006, AAMAS '06.

[39]  Jong-Hwan Kim,et al.  Modular Q-learning based multi-agent cooperation for robot soccer , 2001, Robotics Auton. Syst..

[40]  Antanas Verikas,et al.  Soft combination of neural classifiers: A comparative study , 1999, Pattern Recognit. Lett..

[41]  Deepak Vidhate,et al.  To improve association rule mining using new technique: Multilevel relationship algorithm towards cooperative learning , 2014, 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA).

[42]  Parag Kulkarni,et al.  Multilevel Relationship Algorithm for Association Rule Mining used for Cooperative Learning , 2014 .

[44]  Parag Kulkarni,et al.  Enhanced Cooperative Multi-agent Learning Algorithms (ECMLA) using Reinforcement Learning , 2016, 2016 International Conference on Computing, Analytics and Security Trends (CAST).