Zeroth-Order Methods for Convex-Concave Minmax Problems: Applications to Decision-Dependent Risk Minimization

Min-max optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose a random reshuffling-based gradient free Optimistic Gradient Descent-Ascent algorithm for solving convex-concave min-max problems with finite sum structure. We prove that the algorithm enjoys the same convergence rate as that of zeroth-order algorithms for convex minimization problems. We further specialize the algorithm to solve distributionally robust, decision-dependent learning problems, where gradient information is not readily available. Through illustrative simulations, we observe that our proposed approach learns models that are simultaneously robust against adversarial distribution shifts and strategic decisions from the data sources, and outperforms existing methods from the strategic classification literature.

[1]  Tijana Zrnic,et al.  Outside the Echo Chamber: Optimizing the Performative Risk , 2021, ICML.

[2]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[3]  Cécile Murat,et al.  Recent advances in robust optimization: An overview , 2014, Eur. J. Oper. Res..

[4]  Ohad Shamir,et al.  Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.

[5]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Math. Program. Comput..

[6]  Dick den Hertog,et al.  A practical guide to robust optimization , 2015, 1501.02634.

[7]  Celestine Mendler-Dünner,et al.  Stochastic Optimization for Performative Prediction , 2020, NeurIPS.

[8]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[9]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[10]  Constantine Caramanis,et al.  Theory and Applications of Robust Optimization , 2010, SIAM Rev..

[11]  Christos H. Papadimitriou,et al.  Strategic Classification , 2015, ITCS.

[12]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[13]  Daniel Kuhn,et al.  Distributionally Robust Logistic Regression , 2015, NIPS.

[14]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[15]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[16]  David S. Leslie,et al.  Bandit learning in concave $N$-person games , 2018, 1810.01925.

[17]  W. Rudin Principles of mathematical analysis , 1964 .

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  Shiqian Ma,et al.  Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities , 2020, ArXiv.

[20]  Sijia Liu,et al.  Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML , 2019, ArXiv.

[21]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[22]  Pramod K. Varshney,et al.  A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning: Principals, Recent Advances, and Applications , 2020, IEEE Signal Processing Magazine.

[23]  Celestine Mendler-Dünner,et al.  Performative Prediction , 2020, ICML.

[24]  Suvrit Sra,et al.  Random Shuffling Beats SGD after Finite Epochs , 2018, ICML.

[25]  Bernhard Schölkopf,et al.  Optimal Decision Making Under Strategic Behavior , 2019, ArXiv.

[26]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[27]  D. Drusvyatskiy,et al.  Stochastic Optimization with Decision-Dependent Distributions , 2020, Math. Oper. Res..

[28]  Jinfeng Yi,et al.  AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks , 2018, AAAI.

[29]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[30]  Anja De Waegenaere,et al.  Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , 2011, Manag. Sci..

[31]  Konstantin Mishchenko,et al.  Random Reshuffling: Simple Analysis with Vast Improvements , 2020, NeurIPS.

[32]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[33]  James C. Spall,et al.  A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..

[34]  Aaron Roth,et al.  Strategic Classification from Revealed Preferences , 2017, EC.

[35]  Dimitris Papailiopoulos,et al.  Closing the convergence gap of SGD without replacement , 2020, ICML.

[36]  Sanjay Mehrotra,et al.  Distributionally Robust Optimization: A Review , 2019, ArXiv.

[37]  Xiang Gao,et al.  On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective , 2017, Journal of Scientific Computing.

[38]  Tong Zhang,et al.  NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks , 2019, ICML.

[39]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[40]  Michael I. Jordan,et al.  Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization , 2021, AISTATS.

[41]  Paul I. Barton,et al.  Decision-dependent probabilities in stochastic programs with recourse , 2018, Comput. Manag. Sci..

[42]  J. Andrew Bagnell,et al.  Robust Supervised Learning , 2005, AAAI.

[43]  Asuman E. Ozdaglar,et al.  Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.

[44]  Maryam Kamgarpour,et al.  Learning to Play Sequential Games versus Unknown Opponents , 2020, NeurIPS.

[45]  Charles Audet,et al.  Derivative-Free and Blackbox Optimization , 2017 .

[46]  L. Bottou Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms , 2009 .

[47]  Ohad Shamir,et al.  How Good is SGD with Random Shuffling? , 2019, COLT 2019.

[48]  Niao He,et al.  Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems , 2020, NeurIPS.

[49]  Tony Jebara,et al.  Frank-Wolfe Algorithms for Saddle Point Problems , 2016, AISTATS.

[50]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.