Improving Performance on Problems with Few Labelled Data by Reusing Stacked Auto-Encoders

Deep architectures have been used in transfer learning applications, with the aim of improving the performance of networks designed for a given problem by reusing knowledge from another problem. In this work we addressed the transfer of knowledge between deep networks used as classifiers of digit and shape images, considering cases where only the set of class labels, or only the data distribution, changed from source to target problem. Our main goal was to study how the performance of knowledge transfer between such problems would be affected by varying the number of layers being retrained and the amount of data used in that retraining. Generally, reusing networks trained for a different label set led to better results than reusing networks trained for a different data distribution. In particular, reusing for less classes a network trained for more classes was beneficial for virtually any amount of training data. In all cases, retraining only one layer to save time consistently led to poorer performance. The results obtained when retraining for upright digits a network trained for rotated digits raise the hypothesis that transfer learning could be used to better deal with image classification problems in which only a small amount of labelled data is available for training.

[1]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[2]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[3]  Dong Yu,et al.  Deep Learning for Signal and Information Processing , 2013 .

[4]  Georg Heigold,et al.  Multilingual acoustic models using distributed deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[7]  Lorenzo Bruzzone,et al.  Domain Adaptation Problems: A DASVM Classification Technique and a Circular Validation Strategy , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jürgen Schmidhuber,et al.  Transfer learning for Latin and Chinese characters with Deep Neural Networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[9]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[10]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  Yifan Gong,et al.  Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.