AnEfficient Multi-Agent Q-learning MethodBasedonObserving the Adversary AgentState Change

For thetaskunderMarkovdecision processes, this paperinvestigates and presents a novelmulti-agent reinforcement learning methodbasedon theobserving adversary agent state change. Byobserving theadversary agent state change andtaking itaslearning agents' observation tothe environment, thelearning agents extend thelearning episodes, andderive moreobservation byless action. Intheextreme, the learning agents canconsider theadversary agent state change as their ownexploration policy thatallows learning agents touse exploitation forderiving maximalrewardinthelearning processes. Further, bythediscussion aboutthatthe learning agents' cooperation isdonebyutilizing thedirect communication andtheindirect mediacommunication, this paperalsogives somedescriptions aboutinexpensive features of bothcommunication methodsusedintheproposed learning method. Thedirect communication enhances learning agents' ability ofobserving thetaskenvironment, andtheindirect media communication helpslearning agents toderive theoptimal action policy efficiently. Thesimulation results onthehunter gamedemonstrate theefficiency oftheproposed method.

[1]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[2]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[3]  Munindar P. Singh,et al.  Challenges for Machine Learning in Cooperative Information Systems , 1996, ECAI Workshop LDAIS / ICMAS Workshop LIOME.

[4]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.