DMS : Distributed Sparse Tensor Factorization with Alternating Least Squares

Tensors are data structures indexed along three or more dimensions. Tensors have found increasing use in domains such as data mining and recommender systems where dimensions can have enormous length and are resultingly very sparse. The canonical polyadic decomposition (CPD) is the most popular tensor factorization for discovering latent features and is most commonly found via the method of alternating least squares (CPD-ALS). Factoring large, sparse tensors is a computationally challenging task which can no longer be done in the memory of a typical workstation. State of the art methods for distributed memory systems have focused on decomposing the tensor in a one-dimensional (1D) fashion that prohibitively requires the dense matrix factors to be fully replicated on each node. To that effect, we present DMS, a novel distributed CPD-ALS algorithm. DMS utilizes a 3D decomposition that avoids complete factor replication and communication. DMS has a hybrid MPI+OpenMP implementation that utilizes multicore architectures with a low memory footprint. We theoretically evaluate DMS against leading CPD-ALS methods and experimentally compare them across a variety of datasets. Our 3D decomposition reduces communication volume by 74% on average and is over 35x faster than state of the art MPI code on a tensor with 1.7 billion nonzeros.

[1]  Nikos D. Sidiropoulos,et al.  Parallel Algorithms for Constrained Tensor Factorization via Alternating Direction Method of Multipliers , 2014, IEEE Transactions on Signal Processing.

[2]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[3]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[4]  André Lima Férrer de Almeida,et al.  Distributed large-scale tensor decomposition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Nikos D. Sidiropoulos,et al.  SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[6]  Qiang Zhang,et al.  A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data , 2009, ICCS.

[7]  J. H. Choi,et al.  DFacTo: Distributed Factorization of Tensors , 2014, NIPS.

[8]  Martha Larson,et al.  TFMAP: optimizing MAP for top-n context-aware recommendation , 2012, SIGIR '12.

[9]  Nikos D. Sidiropoulos,et al.  A parallel algorithm for big tensor decomposition using randomly compressed cubes (PARACOMP) , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[11]  Brett W. Bader,et al.  The TOPHITS Model for Higher-Order Web Link Analysis∗ , 2006 .

[12]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[13]  Steffen Staab,et al.  PINTS: peer-to-peer infrastructure for tagging systems , 2008, IPTPS.

[14]  Nikos D. Sidiropoulos,et al.  Memory-efficient parallel computation of tensor and matrix products for big tensor decomposition , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[15]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[16]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[17]  James Bennett,et al.  The Netflix Prize , 2007 .

[18]  Brendan Vastenhouw,et al.  A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication , 2005, SIAM Rev..

[19]  Kijung Shin,et al.  Distributed Methods for High-Dimensional and Large-Scale Tensor Factorization , 2014, 2014 IEEE International Conference on Data Mining.

[20]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .