Sparse Combinatorial Autoencoders

Recent research has shown that employing unsupervised pretraining often produces well-conditioned neural network initializations that lead to better local optima attained during training. One commonly used pretraining method involves hierarchically stacking sparse autoencoders (SAs) and learning the network parameters layer by layer using unlabeled data. Large network sizes and the amount of data required to properly pretrain a deep network make pretraining computationally intensive and the training bottleneck. To alleviate this problem, we propose a novel warm-start procedure for the SA capable of rapidly initializing large SAs in parameter regions yielding fast convergence to good local optima. At the heart of our approach lies the sparse combinatorial autoencoder (SCA), a novel method to regularize neural networks that allows us to train an SA withH features in O( √ H) time. We present a comprehensive series of experiments demonstrating the effectiveness of the warm-start procedure, called fast initialization with SCAs (FISCA), on the STL-10 and the MNIST datasets. Our experiments consider untied sigmoid and tied soft-rectified SAs of various sizes and demonstrate that FISCA ultimately yields significantly reduced training times compared to widely prevalent initialization techniques. For example, on the MNIST dataset, FISCAinitialized soft-rectified SAs with 10K hidden neurons converge over 20× faster to notably better local optima than SAs initialized with alternate methods.

[1]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[2]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[3]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[4]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[5]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[6]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[7]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[9]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10]  Geoffrey E. Hinton,et al.  Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.

[11]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[12]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[13]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[14]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[15]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[16]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[17]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[18]  Andrew Y. Ng,et al.  Emergence of Object-Selective Features in Unsupervised Feature Learning , 2012, NIPS.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.