A novel efficient method for training sparse auto-encoders

The success of machine learning algorithms generally depends on data representation. So far there has been a great deal of literature on unsupervised feature learning and joint training of deep learning. There is little specific guidance, however, on combining hand-designed features or the operations on them with features which are learned from unsupervised learning. In this paper, using MNIST (“Modified National Institute of Standards and Technology”) handwritten digit database as an example, we propose a novel method for training sparse auto-encoders. In this method, we first get some small-scale features through training, then generate more features through operations such as rotation and translation. Finally, we use the whole dataset to fine-tune the network. This approach avoids optimizing cost function for all nodes in the traditional sparse auto-encoder training process, which is very time-consuming. Simulation results show that the proposed method can speed up the training process by over 50%, while keeping the recognition accuracy at the same level or even better. The present findings also contribute to the field's understanding of sparse representation that large-scale sparse features can be generated by small-scale sparse features.

[1]  G. A Theory for Multiresolution Signal Decomposition : The Wavelet Representation , 2004 .

[2]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[4]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[5]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[6]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[7]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jianping Huang,et al.  Multiresolution Fourier Transform and Its Application to Analysis of CO 2 Fluctuations over Alert , 1997 .

[9]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[10]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[11]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[12]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[13]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[14]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..