Combinatorial Reinforcement Learning of Linear Assignment Problems

Recent growing interest in Artificial Intelligence (AI) and platform-based autonomous fleet management systems support the algorithmic research of new means for dynamic and large-scale fleet management. At the same time, recent advancements in deep and reinforcement learning confirm promising results by solving large-scale and complex decision problems and might provide new context sensitive benefits for optimization. In this paper, we solve a residing combinatorial optimization problem commonly known as graph-based pairwise assignment, maximum bipartite cardinality matching, min-cut, or max-sum problem by the application of reinforcement learning in comparison with traditional linear programming algorithms. We provide simulative quantitative and qualitative results regarding by solving symmetric and asymmetric bipartite graphs with multiple algorithms. Particularly, the comparison includes solutions of Cplex, Hungarian-Munkres-Kuhn, Jonker Volgenant and Nearest Neighbor algorithm to reinforcement learning-based algorithms such as Q-learning and Sarsa algorithms. Finally, we show that reinforcement learning can solve small symmetric bipartite maximum matching problems close to linear programming quality, depending on the available processing time and graph size, but on the other hand is outperformed for large-scale asymmetric problems by linear programming-based and nearest neighbor-based algorithms subject to the constraint of achieving conflict-free solutions.

[1]  Nicolas Jozefowiez,et al.  The vehicle routing problem: Latest advances and new challenges , 2007 .

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[4]  D. Bertsekas The auction algorithm: A distributed relaxation method for the assignment problem , 1988 .

[5]  K. Seaton,et al.  Stations, trains and small-world networks , 2003, cond-mat/0311254.

[6]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[7]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[8]  Dimitri P. Bertsekas,et al.  A forward/reverse auction algorithm for asymmetric assignment problems , 1992, Comput. Optim. Appl..

[9]  D. West Introduction to Graph Theory , 1995 .

[10]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[11]  Shimon Whiteson,et al.  Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.

[12]  Teofilo F. Gonzalez,et al.  P-Complete Approximation Problems , 1976, J. ACM.

[13]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[14]  Tom Holvoet,et al.  Reinforcement Learning of Heuristic EV Fleet Charging in a Day-Ahead Electricity Market , 2015, IEEE Transactions on Smart Grid.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Fritz Wysotzki,et al.  Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..

[17]  Milind Dawande,et al.  The Surplus Inventory Matching Problem in the Process Industry , 2000, Oper. Res..

[18]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[20]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[21]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[22]  G. Finke Quadratic Assignment Problems , 1987 .

[23]  Warren B. Powell,et al.  An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application , 2009, Transp. Sci..

[24]  Tao Zhou,et al.  Model and empirical study on some collaboration networks , 2006 .

[25]  V. Palchykov,et al.  Public transport networks: empirical analysis and modeling , 2008, 0803.3514.

[26]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[27]  Javier Alonso-Mora,et al.  Predictive routing for autonomous mobility-on-demand systems with ride-sharing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).