Better Mixing via Deep Representations

It has been hypothesized, and supported with experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation. We study the following related conjecture: better representations, in the sense of better disentangling, can be exploited to produce Markov chains that mix faster between modes. Consequently, mixing between modes would be more efficient at higher levels of representation. To better understand this, we propose a secondary conjecture: the higher-level samples fill more uniformly the space they occupy and the high-density manifolds tend to unfold when represented at higher levels. The paper discusses these hypotheses and tests them experimentally through visualization and measurements of mixing between modes and interpolating between samples.

[1]  Johan Håstad,et al.  Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[2]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[3]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[4]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[5]  Radford M. Neal Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  H. Flor Remapping somatosensory cortex after injury. , 2003, Advances in neurology.

[8]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[9]  Lawrence Cayton,et al.  Algorithms for manifold learning , 2005 .

[10]  Johan Håstad,et al.  On the power of small-depth threshold circuits , 1991, computational complexity.

[11]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[12]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[13]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[14]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[15]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[16]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[17]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[18]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[19]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[20]  Ruslan Salakhutdinov,et al.  Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[21]  Ruslan Salakhutdinov,et al.  Learning Deep Boltzmann Machines using Adaptive MCMC , 2010, ICML.

[22]  Tapani Raiko,et al.  Parallel tempering is efficient for learning restricted Boltzmann machines , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[23]  Pascal Vincent,et al.  Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[24]  Hariharan Narayanan,et al.  Sample Complexity of Testing the Manifold Hypothesis , 2010, NIPS.

[25]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[26]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[27]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[28]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[29]  Pascal Vincent,et al.  Quickly Generating Representative Samples from an RBM-Derived Process , 2011, Neural Computation.

[30]  Pascal Vincent,et al.  A Generative Process for Contractive Auto-Encoders , 2012, ICML.

[31]  Yoshua Bengio,et al.  A Generative Process for sampling Contractive Auto-Encoders , 2012, ICML 2012.

[32]  Yoshua Bengio,et al.  Regularized Auto-Encoders Estimate Local Statistics , 2012, ICLR.