Exponential Machines

Modeling interactions between features improves the performance of machine learning solutions in many domains (e.g. recommender systems or sentiment analysis). In this paper, we introduce Exponential Machines (ExM), a predictor that models all interactions of every order. The key idea is to represent an exponentially large tensor of parameters in a factorized format called Tensor Train (TT). The Tensor Train format regularizes the model and lets you control the number of underlying parameters. To train the model, we develop a stochastic Riemannian optimization procedure, which allows us to fit tensors with 2^160 entries. We show that the model achieves state-of-the-art performance on synthetic data with high-order interactions and that it works on par with high-order factorization machines on a recommender system dataset MovieLens 100K.

[1]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[2]  Ivan Oseledets,et al.  Expressive power of recurrent neural networks , 2017, ICLR.

[3]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  Immanuel Bayer fastFM: A Library for Factorization Machines , 2016, J. Mach. Learn. Res..

[7]  Reinhold Schneider,et al.  On manifolds of tensors of fixed TT-rank , 2012, Numerische Mathematik.

[8]  Naonori Ueda,et al.  Higher-Order Factorization Machines , 2016, NIPS.

[9]  Hiroyuki Sato,et al.  A new, globally convergent Riemannian conjugate gradient method , 2013, 1302.0125.

[10]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[11]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[12]  U. Schollwoeck The density-matrix renormalization group in the age of matrix product states , 2010, 1008.3477.

[13]  Jiyan Yang,et al.  Tensor machines for learning target-specific polynomial features , 2015, ArXiv.

[14]  Yiping Ke,et al.  Stochastic Variance Reduced Riemannian Eigensolver , 2016, ArXiv.

[15]  David J. Schwab,et al.  Supervised Learning with Tensor Networks , 2016, NIPS.

[16]  Ivan V. Oseledets,et al.  Time Integration of Tensor Trains , 2014, SIAM J. Numer. Anal..

[17]  Naonori Ueda,et al.  Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms , 2016, ICML.

[18]  Ivor W. Tsang,et al.  Riemannian Pursuit for Big Matrix Recovery , 2014, ICML.

[19]  Masashi Sugiyama,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives , 2017, Found. Trends Mach. Learn..

[20]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[21]  Suvrit Sra,et al.  Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[22]  Michel Verhaegen,et al.  Learning multidimensional Fourier series with tensor trains , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[23]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[24]  Andrzej Cichocki,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions , 2016, Found. Trends Mach. Learn..

[25]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[26]  Silvere Bonnabel,et al.  Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach , 2010, J. Mach. Learn. Res..

[27]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[28]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[29]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[30]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .