Distributed Flexible Nonlinear Tensor Factorization

Tensor factorization is a powerful tool to analyse multi-way data. Compared with traditional multi-linear methods, nonlinear tensor factorization models are capable of capturing more complex relationships in the data. However, they are computationally expensive and may suffer severe learning bias in case of extreme data sparsity. To overcome these limitations, in this paper we propose a distributed, flexible nonlinear tensor factorization model. Our model can effectively avoid the expensive computations and structural restrictions of the Kronecker-product in existing TGP formulations, allowing an arbitrary subset of tensorial entries to be selected to contribute to the training. At the same time, we derive a tractable and tight variational evidence lower bound (ELBO) that enables highly decoupled, parallel computations and high-quality inference. Based on the new bound, we develop a distributed inference algorithm in the MapReduce framework, which is key-value-free and can fully exploit the memory cache mechanism in fast MapReduce systems such as SPARK. Experimental results fully demonstrate the advantages of our method over several state-of-the-art approaches, in terms of both predictive performance and computational efficiency. Moreover, our approach shows a promising potential in the application of Click-Through-Rate (CTR) prediction for online advertising.

[1]  Yuan Qi,et al.  DinTucker: Scaling up Gaussian process models on multidimensional arrays with billions of elements , 2013, ArXiv.

[2]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[3]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[4]  Zenglin Xu,et al.  Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis , 2011, ICML.

[5]  Lawrence Carin,et al.  Zero-Truncated Poisson Tensor Factorization for Massive Binary Tensors , 2015, UAI.

[6]  Matthew Harding,et al.  Scalable Probabilistic Tensor Factorization for Binary and Count Data , 2015, IJCAI.

[7]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[8]  David B. Dunson,et al.  Bayesian Conditional Tensor Factorizations for High-Dimensional Classification , 2013, Journal of the American Statistical Association.

[9]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[10]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[11]  T. Minka Old and New Matrix Algebra Useful for Statistics , 2000 .

[12]  Zenglin Xu,et al.  Scalable Nonparametric Multiway Data Analysis , 2015, AISTATS.

[13]  J. H. Choi,et al.  DFacTo: Distributed Factorization of Tensors , 2014, NIPS.

[14]  Zenglin Xu,et al.  DinTucker: Scaling Up Gaussian Process Models on Large Multidimensional Arrays , 2016, AAAI.

[15]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[16]  Charles Darwin,et al.  Experiments , 1800, The Medical and physical journal.

[17]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[18]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[19]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[20]  Deepak Agarwal,et al.  LASER: a scalable response prediction platform for online advertising , 2014, WSDM.

[21]  Wei Chu,et al.  Probabilistic Models for Incomplete Multi-dimensional Arrays , 2009, AISTATS.

[22]  Tamara G. Kolda,et al.  Scalable Tensor Factorizations for Incomplete Data , 2010, ArXiv.

[23]  David B. Dunson,et al.  Scalable Bayesian Low-Rank Decomposition of Incomplete Multiway Tensors , 2014, ICML.

[24]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[25]  Max Welling,et al.  Asynchronous Distributed Learning of Topic Models , 2008, NIPS.

[26]  T. Nipkow,et al.  Probabilistic Models , 2004 .

[27]  Zoubin Ghahramani,et al.  Random function priors for exchangeable arrays with applications to graphs and relational data , 2012, NIPS.

[28]  Peter D. Hoff,et al.  Hierarchical multilinear models for multiway data , 2010, Comput. Stat. Data Anal..

[29]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[30]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[31]  A. Davidson Optimizing Shuffle Performance in Spark , 2013 .

[32]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[33]  Han Liu,et al.  Provable sparse tensor decomposition , 2015, 1502.01425.