Deep multi-task learning with evolving weights

Pre-training of deep neural networks has been abandoned in the last few years. The main reason is the difficulty to control the overfitting and tune the consequential raised number of hyper-parameters. In this paper we use a multi-task learning framework that gathers weighted supervised and unsupervised tasks. We propose to evolve the weights along the learning epochs in order to avoid the break in the sequential transfer learning used in the pre-training scheme. This framework allows the use of unlabeled data. Extensive experiments on MNIST showed interesting results.

[1]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[2]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[3]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[4]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[5]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[6]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  S. C. Suddarth,et al.  Rule-Injection Hints as a Means of Improving Network Performance and Learning Time , 1990, EURASIP Workshop.

[8]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[9]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[10]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[11]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[12]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[13]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[14]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[15]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[16]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[17]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.