Semantics, Representations and Grammars for Deep Learning

Deep learning is currently the subject of intensive study. However, fundamental concepts such as representations are not formally defined -- researchers "know them when they see them" -- and there is no common language for describing and analyzing algorithms. This essay proposes an abstract framework that identifies the essential features of current practice and may provide a foundation for future developments. The backbone of almost all deep learning algorithms is backpropagation, which is simply a gradient computation distributed over a neural network. The main ingredients of the framework are thus, unsurprisingly: (i) game theory, to formalize distributed optimization; and (ii) communication protocols, to track the flow of zeroth and first-order information. The framework allows natural definitions of semantics (as the meaning encoded in functions), representations (as functions whose semantics is chosen to optimized a criterion) and grammars (as communication protocols equipped with first-order convergence guarantees). Much of the essay is spent discussing examples taken from the literature. The ultimate aim is to develop a graphical language for describing the structure of deep learning algorithms that backgrounds the details of the optimization procedure and foregrounds how the components interact. Inspiration is taken from probabilistic graphical models and factor graphs, which capture the essential structural features of multivariate distributions.

[1]  Edoardo M. Airoldi,et al.  Statistical analysis of stochastic gradient methods for generalized linear models , 2014, ICML.

[2]  R. Sutton,et al.  A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[3]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[4]  David Balduzzi,et al.  Metabolic Cost as an Organizing Principle for Cooperative Learning , 2012, Adv. Complex Syst..

[5]  Maxim Raginsky,et al.  Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.

[6]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[7]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[8]  Jürgen Schmidhuber,et al.  Market-Based Reinforcement Learning in Partially Observable Worlds , 2001, ICANN.

[9]  Amos J. Storkey,et al.  Machine Learning Markets , 2011, AISTATS.

[10]  Samuel J. Gershman,et al.  Computational rationality: A converging paradigm for intelligence in brains, minds, and machines , 2015, Science.

[11]  P. Dayan Twenty-Five Lessons from Computational Neuromodulation , 2012, Neuron.

[12]  Gábor Lugosi,et al.  Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..

[13]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[14]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[15]  Haipeng Luo,et al.  Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[16]  H. Robbins A Stochastic Approximation Method , 1951 .

[17]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[18]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[19]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[20]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[21]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[22]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[23]  David Balduzzi,et al.  Towards a learning-theoretic analysis of spike-timing dependent plasticity , 2012, NIPS.

[24]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[25]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[26]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[27]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[28]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[29]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[30]  Eric B. Baum,et al.  Toward a Model of Intelligence as an Economy of Agents , 1999, Machine Learning.

[31]  Nathan Lay,et al.  Supervised Aggregation of Classifiers using Artificial Prediction Markets , 2010, ICML.

[32]  Yoshua Bengio,et al.  Blocks and Fuel: Frameworks for deep learning , 2015, ArXiv.

[33]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[34]  John S. Edwards,et al.  The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .

[35]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[36]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[37]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[38]  Shalabh Bhatnagar,et al.  Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.

[39]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[40]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[41]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[42]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[43]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[44]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[45]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[46]  L. Bottou From machine learning to machine reasoning , 2011, Machine Learning.

[47]  R. Vohra,et al.  Calibrated Learning and Correlated Equilibrium , 1996 .

[48]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[49]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[50]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[51]  Edoardo M. Airoldi,et al.  Implicit Temporal Differences , 2014, ArXiv.

[52]  Muhammad Ghifary,et al.  Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies , 2015, ArXiv.

[53]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[54]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[55]  Patrick Gallinari,et al.  A Framework for the Cooperation of Learning Algorithms , 1990, NIPS.

[56]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[57]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[58]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[59]  O. G. Selfridge,et al.  Pandemonium: a paradigm for learning , 1988 .

[60]  M. Minsky The Society of Mind , 1986 .

[61]  Ohad Shamir,et al.  On Lower and Upper Bounds for Smooth and Strongly Convex Optimization Problems , 2015, ArXiv.

[62]  David Balduzzi,et al.  Cortical prediction markets , 2014, AAMAS.

[63]  David I. Spivak The operad of wiring diagrams: formalizing a graphical language for databases, recursion, and plug-and-play circuits , 2013, ArXiv.

[64]  Joachim M. Buhmann,et al.  Kickback Cuts Backprop's Red-Tape: Biologically Plausible Credit Assignment in Neural Networks , 2014, AAAI.

[65]  Jan Peters,et al.  Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..

[66]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[67]  Giulio Tononi,et al.  What can neurons do for their brain? Communicate selectivity with bursts , 2013, Theory in Biosciences.

[68]  A G Barto,et al.  Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[69]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  David Balduzzi,et al.  Falsification and Future Performance , 2011, Algorithmic Probability and Friends.

[71]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[72]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[73]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[74]  Mark D. Reid,et al.  Convergence Analysis of Prediction Markets via Randomized Subspace Descent , 2015, NIPS.

[75]  D. Rumelhart Parallel Distributed Processing Volume 1: Foundations , 1987 .

[76]  Rafal Butowt,et al.  Anterograde axonal transport, transcytosis, and recycling of neurotrophic factors , 2001, Molecular Neurobiology.

[77]  Geoffrey J. Gordon No-regret Algorithms for Online Convex Programs , 2006, NIPS.

[78]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[79]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[80]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[81]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[82]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[83]  J. Wickens,et al.  Timing is not Everything: Neuromodulation Opens the STDP Gate , 2010, Front. Syn. Neurosci..

[84]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[85]  James E. Tomberlin,et al.  On the Plurality of Worlds. , 1989 .

[86]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[87]  Léon Bottou,et al.  From machine learning to machine reasoning , 2011, Machine Learning.

[88]  Yann LeCun,et al.  Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.

[89]  David Balduzzi,et al.  Deep Online Convex Optimization by Putting Forecaster to Sleep , 2015, ArXiv.

[90]  David Balduzzi,et al.  Randomized co-training: from cortical neurons to machine learning and back again , 2013, ArXiv.

[91]  Jacob D. Abernethy,et al.  A Collaborative Mechanism for Crowdsourcing Prediction Problems , 2011, NIPS.

[92]  Yann LeCun,et al.  The Loss Surface of Multilayer Networks , 2014, ArXiv.

[93]  Michael P. Wellman,et al.  Economic reasoning and artificial intelligence , 2015, Science.

[94]  Martin J. Wainwright,et al.  Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[95]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[96]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[97]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[98]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[99]  Barak A. Pearlmutter,et al.  Automatic Differentiation of Algorithms for Machine Learning , 2014, ArXiv.

[100]  Kenneth D. Harris,et al.  The Neural Marketplace: I. General Formalism and Linear Theory , 2014, bioRxiv.

[101]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[102]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.