Deep Online Convex Optimization by Putting Forecaster to Sleep

Methods from convex optimization such as accelerated gradient descent are widely used as building blocks for deep learning algorithms. However, the reasons for their empirical success are unclear, since neural networks are not convex and standard guarantees do not apply. This paper develops the first rigorous link between online convex optimization and error backpropagation on convolutional networks. The first step is to introduce circadian games, a mild generalization of convex games with similar convergence properties. The main result is that error backpropagation on a convolutional network is equivalent to playing out a circadian game. It follows immediately that the waking-regret of players in the game (the units in the neural network) controls the overall rate of convergence of the network. Finally, we explore some implications of the results: (i) we describe the representations learned by a neural network game-theoretically, (ii) propose a learning setting at the level of individual units that can be plugged into deep architectures, and (iii) propose a new approach to adaptive model selection by applying bandit algorithms to choose which players to wake on each round.

[1]  A. Copeland Review: John von Neumann and Oskar Morgenstern, Theory of games and economic behavior , 1945 .

[2]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[3]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[4]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[5]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[6]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[7]  Patrick Gallinari,et al.  A Framework for the Cooperation of Learning Algorithms , 1990, NIPS.

[8]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[9]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[10]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Philip M. Long,et al.  Apple Tasting , 2000, Inf. Comput..

[13]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[14]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[15]  Nicolas Le Roux,et al.  Convex Neural Networks , 2005, NIPS.

[16]  Geoffrey J. Gordon No-regret Algorithms for Online Convex Programs , 2006, NIPS.

[17]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[18]  Elad Hazan,et al.  Computational Equivalence of Fixed Points and No Regret Algorithms, and Convergence to Equilibria , 2007, NIPS.

[19]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[20]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[21]  Gábor Lugosi,et al.  Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..

[22]  Geoffrey J. Gordon,et al.  No-regret learning in convex games , 2008, ICML '08.

[23]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[25]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[26]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[27]  Elad Hazan The convex optimization approach to regret minimization , 2011 .

[28]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[29]  Nicolò Cesa-Bianchi,et al.  A new look at shifting regret , 2012, ArXiv.

[30]  David Balduzzi,et al.  Towards a learning-theoretic analysis of spike-timing dependent plasticity , 2012, NIPS.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[33]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[34]  Tomer Koren,et al.  Open Problem: Fast Stochastic Exp-Concave Optimization , 2013, COLT.

[35]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[37]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[38]  David Balduzzi,et al.  Randomized co-training: from cortical neurons to machine learning and back again , 2013, ArXiv.

[39]  Geoffrey E. Hinton,et al.  On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[41]  David Balduzzi,et al.  Cortical prediction markets , 2014, AAMAS.

[42]  Yann LeCun,et al.  The Loss Surface of Multilayer Networks , 2014, ArXiv.

[43]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[44]  Xinhua Zhang,et al.  Convex Deep Learning via Normalized Kernels , 2014, NIPS.

[45]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[46]  Michael P. Wellman,et al.  Economic reasoning and artificial intelligence , 2015, Science.

[47]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[48]  Joachim M. Buhmann,et al.  Kickback Cuts Backprop's Red-Tape: Biologically Plausible Credit Assignment in Neural Networks , 2014, AAAI.

[49]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[50]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[51]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[52]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..