On the Expressive Power of Deep Architectures

Deep architectures are families of functions corresponding to deep circuits. Deep Learning algorithms are based on parametrizing such circuits and tuning their parameters so as to approximately optimize some training objective. Whereas it was thought too difficult to train deep architectures, several successful algorithms have been proposed in recent years. We review some of the theoretical motivations for deep architectures, as well as some of their practical successes, and propose directions of investigations to address some of the remaining challenges.

[1]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[2]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[3]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[4]  Pascal Vincent,et al.  Deep Learning using Robust Interdependent Codes , 2009, AISTATS.

[5]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[7]  Yoshua Bengio,et al.  Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[8]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[9]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[10]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[11]  Pascal Vincent,et al.  Quickly Generating Representative Samples from an RBM-Derived Process , 2011, Neural Computation.

[12]  Yoshua Bengio,et al.  DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS , 2010, Comput. Intell..

[13]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[14]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[15]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[16]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[17]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[18]  Yann LeCun,et al.  Deep belief net learning in a long-range vision system for autonomous off-road driving , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[20]  Yann LeCun,et al.  Regularized estimation of image statistics by Score Matching , 2010, NIPS.

[21]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[22]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[23]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[24]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[25]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[27]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[28]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[29]  Johan Håstad,et al.  Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[30]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[31]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[32]  Pascal Vincent,et al.  Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[33]  Mark Braverman,et al.  Poly-logarithmic independence fools bounded-depth boolean circuits , 2011, Commun. ACM.

[34]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[35]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[36]  Johan Håstad,et al.  On the power of small-depth threshold circuits , 1991, computational complexity.

[37]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[38]  Max Welling,et al.  Herding Dynamic Weights for Partially Observed Random Field Models , 2009, UAI.

[39]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[40]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[41]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[42]  P. Lennie The Cost of Cortical Computation , 2003, Current Biology.

[43]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[44]  Yoshua Bengio,et al.  Non-Local Manifold Tangent Learning , 2004, NIPS.

[45]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[46]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[47]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[48]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[49]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[50]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[51]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[52]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[53]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[54]  Marc'Aurelio Ranzato,et al.  Learning invariant features through topographic filter maps , 2009, CVPR.

[55]  Pascal Vincent,et al.  Higher Order Contractive Auto-Encoder , 2011, ECML/PKDD.

[56]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[57]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[58]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[59]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[60]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[61]  Andrew Chi-Chih Yao,et al.  Separating the Polynomial-Time Hierarchy by Oracles (Preliminary Version) , 1985, FOCS.

[62]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[63]  Yoshua Bengio,et al.  Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.

[64]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[65]  Geoffrey E. Hinton,et al.  Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[66]  S. Laughlin,et al.  An Energy Budget for Signaling in the Grey Matter of the Brain , 2001, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[67]  Geoffrey E. Hinton Products of experts , 1999 .

[68]  Razvan Pascanu,et al.  Deep Learners Benefit More from Out-of-Distribution Examples , 2011, AISTATS.