Efficient representation ranking for transfer learning

Representation learning has emerged recently as a useful tool in the extraction of features from data. In a range of applications, features learned from data have been shown superior to their hand-crafted counterpart. Many deep learning approaches have taken advantage of such feature extraction. However, further research is needed on how such features can be evaluated for re-use in related applications, hopefully then improving performance on such applications. In this paper, we present a new method for ranking the representations learned by a Restricted Boltzmann Machine, which has been used regularly as a feature learner by deep networks. We show that high-ranking features, according to our method, should capture more information than low-ranking ones. We then apply representation ranking for pruning the network, and propose a new transfer learning algorithm, which uses such features extracted from a trained network to improve learning performance in another network trained on an analogous domain. We show that by transferring a small number of highest scored representations from source domain our method encourages the learning of new knowledge in target domain while preserving most of the information of the source domain during the transfer. This transfer learning is similar to self-taught learning in that it does not use the source domain data during the transfer process.

[1]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[2]  S. Tran,et al.  Knowledge Extraction from Deep Belief Networks for Images , 2013 .

[3]  Christopher Joseph Pal,et al.  Heterogeneous Transfer Learning with RBMs , 2011, AAAI.

[4]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[5]  Artur S. d'Avila Garcez,et al.  A Neural-Symbolic Cognitive Agent for Online Learning and Reasoning , 2011, IJCAI.

[6]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[7]  Tapani Raiko,et al.  Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information , 2013, ICONIP.

[8]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[9]  Artur S. d'Avila Garcez,et al.  The Connectionist Inductive Learning and Logic Programming System , 1999, Applied Intelligence.

[10]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  Gadi Pinkas,et al.  Reasoning, Nonmonotonicity and Learning in Connectionist Networks that Capture Propositional Knowledge , 1995, Artif. Intell..

[12]  Jude W. Shavlik,et al.  Transfer Learning via Advice Taking , 2010, Advances in Machine Learning I.

[13]  Raymond J. Mooney,et al.  Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.

[14]  Yoshua Bengio,et al.  Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.

[15]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[16]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[17]  Pedro M. Domingos,et al.  Deep transfer via second-order Markov logic , 2009, ICML '09.

[18]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[19]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[20]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[21]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[22]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[23]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[24]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .