Learning A Minimax Optimizer: A Pilot Study

Solving continuous minimax optimization is of extensive practical interest, yet notoriously unstable and difficult. This paper introduces the learning to optimize (L2O) methodology to the minimax problems for the first time and addresses its accompanying unique challenges. We first present Twin-L2O, the first dedicated minimax L2O framework consisting of two LSTMs for updating min and max variables separately. The decoupled design is found to facilitate learning, particularly when the min and max variables are highly asymmetric. Empirical experiments on a variety of minimax problems corroborate the effectiveness of Twin-L2O. We then discuss a crucial concern of Twin-L2O, i.e., its inevitably limited generalizability to unseen optimizees. To address this issue, we present two complementary strategies. Our first solution, Enhanced Twin-L2O, is empirically applicable for general minimax problems, by improving L2O training via leveraging curriculum learning. Our second alternative, called Safeguarded Twin-L2O, is a preliminary theoretical exploration stating that under some strong assumptions, it is possible to theoretically establish the convergence of Twin-L2O. We benchmark our algorithms on several testbed problems and compare against state-of-the-art minimax solvers. The code is available at: https://github.com/VITA-Group/L2O-Minimax.

[1]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[2]  Bo Dai,et al.  Learning to Defense by Learning to Attack , 2018, DGS@ICLR.

[3]  Tianlong Chen,et al.  Learning to Optimize in Swarms , 2019, NeurIPS.

[4]  Guodong Zhang,et al.  On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach , 2019, ICLR.

[5]  Cho-Jui Hsieh,et al.  Improved Adversarial Training via Learned Optimizer , 2020, ECCV.

[6]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[7]  Chaojian Li,et al.  HALO: Hardware-Aware Learning to Optimize , 2020, ECCV.

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  Wotao Yin,et al.  SAFEGUARDED LEARNED CONVEX OPTIMIZATION , 2019 .

[10]  G. Evans,et al.  Learning to Optimize , 2008 .

[11]  Tengyuan Liang,et al.  Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks , 2018, AISTATS.

[12]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[13]  Angelia Nedic,et al.  Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.

[14]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[15]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[16]  Wei Hu,et al.  Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity , 2018, AISTATS.

[17]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[18]  Jelena Diakonikolas Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion and Strong Solutions to Variational Inequalities , 2020, COLT.

[19]  Chen Gong,et al.  Self-PU: Self Boosted and Calibrated Positive-Unlabeled Training , 2020, ICML.

[20]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[21]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[22]  Zhenyu Wu,et al.  Towards Privacy-Preserving Visual Recognition via Adversarial Training: A Pilot Study , 2018, ECCV.

[23]  M. Hirsch,et al.  Mixed Equilibria and Dynamical Systems Arising from Fictitious Play in Perturbed Games , 1999 .

[24]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[25]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[26]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[27]  Kun Yuan,et al.  ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems and GANs , 2019, ArXiv.

[28]  Alex Sherstinsky,et al.  Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network , 2018, Physica D: Nonlinear Phenomena.

[29]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[30]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[31]  Jihun Hamm,et al.  K-Beam Minimax: Efficient Optimization for Deep Adversarial Learning , 2018, ICML.

[32]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[33]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[34]  Shiyu Chang,et al.  Training Stronger Baselines for Learning to Optimize , 2020, NeurIPS.

[35]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[36]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[37]  J. Danskin The Theory of Max-Min, with Applications , 1966 .

[38]  B. Halpern Fixed points of nonexpanding maps , 1967 .

[39]  Gauthier Gidel,et al.  A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.

[40]  Marios M. Polycarpou,et al.  Cooperative Control of Distributed Multi-Agent Systems , 2001 .

[41]  Seungjin Choi,et al.  Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.

[42]  Hailin Jin,et al.  Privacy-Preserving Deep Action Recognition: An Adversarial Learning Framework and A New Dataset , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[44]  Anoop Cherian,et al.  Game Theoretic Optimization via Gradient-based Nikaido-Isoda Function , 2019, ICML.

[45]  Christos H. Papadimitriou,et al.  Cycles in adversarial regularized learning , 2017, SODA.

[46]  Tianlong Chen,et al.  L2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Jian Li,et al.  Learning Gradient Descent: Better Generalization and Longer Horizons , 2017, ICML.