Solving Combinatorial Problems through Off-Policy Reinforcement Learning Methods

In recent years, reinforcement learning (RL) has emerged as a strong candidate to solve many decision-making problems. It fundamentally combines the benefits of both deep neural networks and control engineering practices to accomplish human-level or even superior competency in decision-concise systems. This work extends this very idea and proves the applicability of RL algorithms to combinatorial puzzles such as river-crossing puzzle and seating arrangement problems. In that regard, this work applied quality-networks (QN), deep-quality-networks (DQN), and double-deep-quality-networks (DDQN) to environments mentioned above, and afterward, evaluated their performance through analyzing reward and performance plots. The study concludes that all these methods effectively completed the puzzle tasks with unnoticeable performance differences for the given environments.

[1]  Wouter Josemans Generalization in Reinforcement Learning , 2009 .

[2]  Thomas Stützle,et al.  Local search algorithms for combinatorial problems - analysis, improvements, and new applications , 1999, DISKI.

[3]  James J. Q. Yu,et al.  Online Vehicle Routing With Neural Combinatorial Optimization and Deep Reinforcement Learning , 2019, IEEE Transactions on Intelligent Transportation Systems.

[4]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[5]  Andres G. Abad,et al.  Deep Reinforcement Learning for Routing a Heterogeneous Fleet of Vehicles , 2019, 2019 IEEE Latin American Conference on Computational Intelligence (LA-CCI).

[6]  E. Gilbert An optimal minimax algorithm , 1985 .

[7]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[8]  Zhuwen Li,et al.  Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search , 2018, NeurIPS.

[9]  Roberto Todeschini,et al.  A new algorithm for optimal, distance based, experimental design , 1992 .

[10]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[11]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[12]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[13]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[14]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[15]  Jakob N. Foerster,et al.  Exploratory Combinatorial Optimization with Reinforcement Learning , 2020, AAAI.

[16]  Yoshua Bengio,et al.  Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[17]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[18]  Evgeny Burnaev,et al.  Reinforcement Learning for Combinatorial Optimization: A Survey , 2020, ArXiv.