Deep Neural Networks Are Congestion Games: From Loss Landscape to Wardrop Equilibrium and Beyond

The theoretical analysis of deep neural networks (DNN) is arguably among the most challenging research directions in machine learning (ML) right now, as it requires from scientists to lay novel statistical learning foundations to explain their behaviour in practice. While some success has been achieved recently in this endeavour, the question on whether DNNs can be analyzed using the tools from other scientific fields outside the ML community has not received the attention it may well have deserved. In this paper, we explore the interplay between DNNs and game theory (GT), and show how one can benefit from the classic readily available results from the latter when analyzing the former. In particular, we consider the widely studied class of congestion games, and illustrate their intrinsic relatedness to both linear and non-linear DNNs and to the properties of their loss surface. Beyond retrieving the state-of-the-art results from the literature, we argue that our work provides a very promising novel tool for analyzing the DNNs and support this claim by proposing concrete open problems that can advance significantly our understanding of DNNs when solved.

[1]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[2]  Dale Schuurmans,et al.  Deep Learning Games , 2016, NIPS.

[3]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[4]  R. Rosenthal A class of games possessing pure-strategy Nash equilibria , 1973 .

[5]  Matthias Hein,et al.  The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.

[6]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[7]  Matthias Hein,et al.  Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions , 2018, ICML.

[8]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[9]  Tengyu Ma,et al.  Identity Matters in Deep Learning , 2016, ICLR.

[10]  Andrea Montanari,et al.  A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.

[11]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[12]  Quynh N. Nguyen,et al.  Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods , 2016, NIPS.

[13]  Tim Roughgarden,et al.  Intrinsic Robustness of the Price of Anarchy , 2015, J. ACM.

[14]  Tim Roughgarden,et al.  Bounding the inefficiency of equilibria in nonatomic congestion games , 2004, Games Econ. Behav..

[15]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[16]  Haisheng Tan,et al.  Congestion Game With Agent and Resource Failures , 2017, IEEE Journal on Selected Areas in Communications.

[17]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[18]  Francis Bach,et al.  On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.

[19]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[20]  R. Cominetti,et al.  The Asymptotic Behavior of the Price of Anarchy , 2017, WINE.

[21]  Oriol Vinyals,et al.  Qualitatively characterizing neural network optimization problems , 2014, ICLR.

[22]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[23]  John Musacchio,et al.  A Game-Theoretic Analysis of Adversarial Classification , 2016, IEEE Transactions on Information Forensics and Security.

[24]  Ohad Shamir,et al.  Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.

[25]  Guillaume Carlier,et al.  Optimal Transportation with Traffic Congestion and Wardrop Equilibria , 2006, SIAM J. Control. Optim..

[26]  O. A. B. Space,et al.  EQUILIBRIUM POINTS OF NONATOMIC GAMES , 2010 .

[27]  Guillaume Carlier,et al.  Optimal Transport and Cournot-Nash Equilibria , 2012, Math. Oper. Res..

[28]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[29]  Sanjay Chawla,et al.  A Game Theoretical Model for Adversarial Learning , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[30]  Joan Bruna,et al.  Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.

[31]  Michael I. Jordan,et al.  Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.

[32]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[33]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[34]  Grant M. Rotskoff,et al.  Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.

[35]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[36]  C. Villani Optimal Transport: Old and New , 2008 .

[37]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[38]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[39]  Elad Hoffer,et al.  Exponentially vanishing sub-optimal local minima in multilayer neural networks , 2017, ICLR.

[40]  David Balduzzi,et al.  Deep Online Convex Optimization with Gated Games , 2016, ArXiv.

[41]  Tobias Scheffer,et al.  Stackelberg games for adversarial prediction problems , 2011, KDD.

[42]  Tim Roughgarden,et al.  Local smoothness and the price of anarchy in atomic splittable congestion games , 2011, SODA '11.

[43]  D. Kinderlehrer,et al.  An introduction to variational inequalities and their applications , 1980 .