A distributed joint-learning and auction algorithm for target assignment

We consider an agent-target assignment problem in an unknown environment modeled as an undirected graph. Agents incur cost or reward while traveling on the edges of this graph. Agents do not know the graph or the locations of the targets on it. However, they can obtain local information about these by local sensing and communicating with other agents within a limited range. To solve this problem, we come up with a new distributed algorithm that integrates Q-Learning and a distributed auction. The Q-Learning part helps estimate the assignment benefits calculated by summing up rewards over the graph edges for each agent-target pair, while the auction part takes care of assigning agents to targets in a distributed fashion. The algorithm is shown to terminate with a near-optimal assignment in a finite time. Optimality refers to the assignment benefit maximization, which can depend on a target-agent pair value, and the routing cost of the agent to visit the target.

[1]  Dimitri P. Bertsekas,et al.  Auction algorithms for network flow problems: A tutorial introduction , 1992, Comput. Optim. Appl..

[2]  T. B. Boffey Linear Network Optimization: Algorithms and Codes , 1994 .

[3]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[4]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[5]  D. Bertsekas The auction algorithm: A distributed relaxation method for the assignment problem , 1988 .

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[8]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[9]  Francesco Bullo,et al.  Monotonic Target Assignment for Robotic Networks , 2009, IEEE Transactions on Automatic Control.

[10]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[11]  George J. Pappas,et al.  Dynamic Assignment in Distributed Motion Planning With Local Coordination , 2008, IEEE Transactions on Robotics.

[12]  Michel Balinski,et al.  Signature Methods for the Assignment Problem , 1985, Oper. Res..

[13]  D.A. Castanon,et al.  Distributed algorithms for dynamic reassignment , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[14]  George J. Pappas,et al.  A distributed auction algorithm for the assignment problem , 2008, 2008 47th IEEE Conference on Decision and Control.

[15]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[16]  Dimitri P. Bertsekas,et al.  A new algorithm for the assignment problem , 1981, Math. Program..