Constrained Combinatorial Optimization with Reinforcement Learning

This paper presents a framework to tackle constrained combinatorial optimization problems using deep Reinforcement Learning (RL). To this end, we extend the Neural Combinatorial Optimization (NCO) theory in order to deal with constraints in its formulation. Notably, we propose defining constrained combinatorial problems as fully observable Constrained Markov Decision Processes (CMDP). In that context, the solution is iteratively constructed based on interactions with the environment. The model, in addition to the reward signal, relies on penalty signals generated from constraint dissatisfaction to infer a policy that acts as a heuristic algorithm. Moreover, having access to the complete state representation during the optimization process allows us to rely on memory-less architectures, enhancing the results obtained in previous sequence-to-sequence approaches. Conducted experiments on the constrained Job Shop and Resource Allocation problems prove the superiority of the proposal for computing rapid solutions when compared to classical heuristic, metaheuristic, and Constraint Programming (CP) solvers.

[1]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[3]  Alejandro Ribeiro,et al.  Constrained Reinforcement Learning Has Zero Duality Gap , 2019, NeurIPS.

[4]  John E. Beasley,et al.  OR-Library: Distributing Test Problems by Electronic Mail , 1990 .

[5]  Lawrence V. Snyder,et al.  Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[6]  B. Williams,et al.  Operations management. , 2001, Optometry.

[7]  Mitsuo Gen,et al.  A tutorial survey of job-shop scheduling problems using genetic algorithms—I: representation , 1996 .

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[11]  Yuandong Tian,et al.  Learning to Perform Local Rewriting for Combinatorial Optimization , 2019, NeurIPS.

[12]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[13]  Hongzi Mao,et al.  Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[14]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[15]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[16]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Ravi Sethi,et al.  The Complexity of Flowshop and Jobshop Scheduling , 1976, Math. Oper. Res..

[19]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[20]  Samy Bengio,et al.  Device Placement Optimization with Reinforcement Learning , 2017, ICML.

[21]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[22]  Vivek S. Borkar,et al.  An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[23]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[24]  Rajkumar Buyya,et al.  Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing , 2012, Future Gener. Comput. Syst..

[25]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[26]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[27]  Alexandre Lacoste,et al.  Learning Heuristics for the TSP by Policy Gradient , 2018, CPAIOR.

[28]  Shie Mannor,et al.  Reward Constrained Policy Optimization , 2018, ICLR.

[29]  Jeffrey Horn,et al.  Handbook of evolutionary computation , 1997 .

[30]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[31]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[32]  E. Altman Constrained Markov Decision Processes , 1999 .

[33]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[34]  Abid Ali Khan,et al.  A research survey: review of flexible job shop scheduling techniques , 2016, Int. Trans. Oper. Res..