Adversarial Deep Learning for Online Resource Allocation

Online algorithm is an important branch in algorithm design. Designing online algorithms with a bounded competitive ratio (in terms of worst-case performance) can be hard and usually relies on problem-specific assumptions. Inspired by adversarial training from Generative Adversarial Net (GAN) and the fact that competitive ratio of an online algorithm is based on worst-case input, we adopt deep neural networks to learn an online algorithm for a resource allocation and pricing problem from scratch, with the goal that the performance gap between offline optimum and the learned online algorithm can be minimized for worst-case input. Specifically, we leverage two neural networks as algorithm and adversary respectively and let them play a zero sum game, with the adversary being responsible for generating worst-case input while the algorithm learns the best strategy based on the input provided by the adversary. To ensure better convergence of the algorithm network (to the desired online algorithm), we propose a novel per-round update method to handle sequential decision making to break complex dependency among different rounds so that update can be done for every possible action, instead of only sampled actions. To the best of our knowledge, our work is the first using deep neural networks to design an online algorithm from the perspective of worst-case performance guarantee. Empirical studies show that our updating methods ensure convergence to Nash equilibrium and the learned algorithm outperforms state-of-the-art online algorithms under various settings.

[1]  Su Ruan,et al.  Medical Image Synthesis with Context-Aware Generative Adversarial Networks , 2016, MICCAI.

[2]  John E. Beasley Multidimensional Knapsack Problems , 2009, Encyclopedia of Optimization.

[3]  Deeparnab Chakrabarty,et al.  Budget constrained bidding in keyword auctions and online knapsack problems , 2008, WINE.

[4]  Sergei Vassilvitskii,et al.  Competitive caching with machine learned advice , 2018, ICML.

[5]  Tengyuan Liang,et al.  Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks , 2018, AISTATS.

[6]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[7]  Thore Graepel,et al.  Differentiable Game Mechanics , 2019, J. Mach. Learn. Res..

[8]  Tim Oates,et al.  Automated Cloud Provisioning on AWS using Deep Reinforcement Learning , 2017, ArXiv.

[9]  Sreenivas Gollapudi,et al.  Online Algorithms for Rent-Or-Buy with Expert Advice , 2019, ICML.

[10]  Siddhartha Banerjee,et al.  Online Allocation and Pricing: Constant Regret via Bellman Inequalities , 2019, Oper. Res..

[11]  Radu Prodan,et al.  Prediction-based real-time resource provisioning for massively multiplayer online games , 2009, Future Gener. Comput. Syst..

[12]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[13]  Joseph Naor,et al.  The Design of Competitive Online Algorithms via a Primal-Dual Approach , 2009, Found. Trends Theor. Comput. Sci..

[14]  Allan Borodin,et al.  On the power of randomization in on-line algorithms , 2005, Algorithmica.

[15]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in Iterated General-Sum Games , 2013, ArXiv.

[16]  Klaus Jansen,et al.  Approximation and Online Algorithms : 8th International Workshop, WAOA 2010, Liverpool, UK, September 9-10, 2010. Revised Papers , 2011 .

[17]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[18]  Gerald Tesauro,et al.  Online Resource Allocation Using Decompositional Reinforcement Learning , 2005, AAAI.

[19]  Zongpeng Li,et al.  Optimal Posted Prices for Online Cloud Resource Allocation , 2017, Proc. ACM Meas. Anal. Comput. Syst..

[20]  Zhuwen Li,et al.  Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search , 2018, NeurIPS.

[21]  Adam Lerer,et al.  DREAM: Deep Regret minimization with Advantage baselines and Model-free learning , 2020, ArXiv.

[22]  Siddhartha Banerjee,et al.  Constant Regret in Online Allocation: On the Sufficiency of a Single Historical Trace , 2020 .

[23]  Zhe Gan,et al.  Generating Text via Adversarial Training , 2016 .

[24]  Google,et al.  Improving Online Algorithms via ML Predictions , 2024, NeurIPS.

[25]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[26]  Nicola Gatti,et al.  Coordination in Adversarial Sequential Team Games via Multi-Agent Deep Reinforcement Learning , 2019, ArXiv.

[27]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[28]  Joseph Naor,et al.  Online Primal-Dual Algorithms for Maximizing Ad-Auctions Revenue , 2007, ESA.

[29]  Sergei Vassilvitskii,et al.  Revenue Optimization with Approximate Bid Predictions , 2017, NIPS.

[30]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[31]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[32]  Andreas Loukas,et al.  Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs , 2020, NeurIPS.

[33]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[34]  Joseph Naor,et al.  Online Primal-Dual Algorithms for Covering and Packing , 2009, Math. Oper. Res..

[35]  Yin Yang,et al.  Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems , 2020, IEEE Transactions on Cybernetics.

[36]  Viatcheslav V. Vinogradov,et al.  Mathematics for Economists , 2010 .

[37]  Chuan Wu,et al.  Learning Resource Allocation and Pricing for Cloud Profit Maximization , 2019, AAAI.

[38]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[39]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[40]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[41]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[42]  Carlo Vercellis,et al.  Stochastic on-line knapsack problems , 1995, Math. Program..

[43]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.