Unsupervised feature extraction with autoencoder trees

Abstract The autoencoder is a popular neural network model that learns hidden representations of unlabeled data. Typically, single- or multilayer perceptrons are used in constructing an autoencoder, but we use soft decision trees (i.e., hierarchical mixture of experts) instead. Such trees have internal nodes that implement soft multivariate splits through a gating function and all leaves are weighted by the gating values on their path to get the output. The encoder tree converts the input to a lower dimensional representation in its leaves, which it passes to the decoder tree that reconstructs the original input. Because the splits are soft, the encoder and decoder trees can be trained back to back with stochastic gradient-descent to minimize reconstruction error. In our experiments on handwritten digits, newsgroup posts, and images, we observe that the autoencoder trees yield as small and sometimes smaller reconstruction error when compared with autoencoder perceptrons. One advantage of the tree is that it learns a hierarchical representation at different resolutions at its different levels and the leaves specialize at different local regions in the input space. An extension with locally linear mappings in the leaves allows a more flexible model. We also show that the autoencoder tree can be used with multimodal data where a mapping from one modality (i.e., image) to another (i.e., topics) can be learned.

[1]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[2]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[3]  Meng Wang,et al.  Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.

[4]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[5]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[6]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[7]  Ethem Alpaydin,et al.  Soft decision trees , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[8]  Hui Xiong,et al.  Representation Learning via Semi-Supervised Autoencoder for Multi-task Learning , 2015, 2015 IEEE International Conference on Data Mining.

[9]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[10]  Ethem Alpaydin,et al.  Budding Trees , 2014, 2014 22nd International Conference on Pattern Recognition.

[11]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[12]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[14]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[15]  Yang Liu,et al.  Recursive Autoencoders for ITG-Based Translation , 2013, EMNLP.

[16]  Beng Chin Ooi,et al.  Effective Multi-Modal Retrieval based on Stacked Auto-Encoders , 2014, Proc. VLDB Endow..

[17]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[18]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[19]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[20]  Mengjie Zhang,et al.  Domain Generalization for Object Recognition with Multi-task Autoencoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[22]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[23]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[24]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[25]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[26]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[27]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[28]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[29]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[30]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Xiaoqing Feng,et al.  Multimodal video classification with stacked contractive autoencoders , 2016, Signal Process..

[32]  Ethem Alpaydin,et al.  Autoencoder Trees , 2014, ACML.