ParCube: Sparse Parallelizable CANDECOMP-PARAFAC Tensor Decomposition

How can we efficiently decompose a tensor into sparse factors, when the data do not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data-mining applications; however, the current state-of-art decomposition algorithms operate on main memory and do not scale to truly large datasets. In this work, we propose P ar C ube , a new and highly parallelizable method for speeding up tensor decompositions that is well suited to produce sparse approximations. Experiments with even moderately large data indicate over 90p sparser outputs and 14 times faster execution, with approximation error close to the current state of the art irrespective of computation and memory requirements. We provide theoretical guarantees for the algorithm’s correctness and we experimentally validate our claims through extensive experiments, including four different real world datasets (E nron , L bnl , F acebook and N ell ), demonstrating its effectiveness for data-mining practitioners. In particular, we are the first to analyze the very large N ell dataset using a sparse tensor decomposition, demonstrating that P ar C ube enables us to handle effectively and efficiently very large datasets. Finally, we make our highly scalable parallel implementation publicly available, enabling reproducibility of our work.

[1]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[2]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION∗ , 2004 .

[3]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[4]  G. Giannakis,et al.  A FAST LEAST SQUARES ALGORITHM FOR SEPARATING TRILINEAR MIXTURES , 2004 .

[5]  Rasmus Bro,et al.  Multiway analysis of epilepsy tensors , 2007, ISMB/ECCB.

[6]  Nikos D. Sidiropoulos,et al.  From K-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors , 2013, IEEE Transactions on Signal Processing.

[7]  Tamara G. Kolda,et al.  Temporal Analysis of Social Networks using Three-way DEDICOM , 2006 .

[8]  Charalampos E. Tsourakakis MACH: Fast Randomized Tensor Decompositions , 2009, SDM.

[9]  Brett W. Bader,et al.  The TOPHITS Model for Higher-Order Web Link Analysis∗ , 2006 .

[10]  Tamara G. Kolda,et al.  MATLAB Tensor Toolbox , 2006 .

[11]  Huan Liu,et al.  CubeSVD: a novel approach to personalized Web search , 2005, WWW '05.

[12]  Nikos D. Sidiropoulos,et al.  Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x , 2014, SDM.

[13]  André Lima Férrer de Almeida,et al.  Distributed large-scale tensor decomposition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[15]  Nikos D. Sidiropoulos,et al.  Parallel Randomly Compressed Cubes : A scalable distributed architecture for big tensor decomposition , 2014, IEEE Signal Processing Magazine.

[16]  Alioune Ngom,et al.  Classification of Clinical Gene-Sample-Time Microarray Expression Data via Tensor Decomposition Methods , 2010, CIBB.

[17]  Jason Lee,et al.  A first look at modern enterprise traffic , 2005, IMC '05.

[18]  Andrzej Cichocki,et al.  PARAFAC algorithms for large-scale problems , 2011, Neurocomputing.

[19]  Tamara G. Kolda,et al.  Cross-language information retrieval using PARAFAC2 , 2007, KDD '07.

[20]  Rasmus Bro,et al.  The N-way Toolbox for MATLAB , 2000 .

[21]  Nikos D. Sidiropoulos,et al.  Adaptive Algorithms to Track the PARAFAC Decomposition of a Third-Order Tensor , 2009, IEEE Transactions on Signal Processing.

[22]  Petros Drineas,et al.  Tensor-CUR decompositions for tensor-based data , 2006, KDD '06.

[23]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[24]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[25]  K. Selçuk Candan,et al.  Approximate tensor decomposition within a tensor-relational algebraic framework , 2011, CIKM '11.

[26]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable Tensor Decompositions , 2012, ECML/PKDD.

[27]  Tamara G. Kolda,et al.  Scalable Tensor Decompositions for Multi-aspect Data Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[28]  Nikos D. Sidiropoulos,et al.  Blind PARAFAC receivers for DS-CDMA systems , 2000, IEEE Trans. Signal Process..

[29]  Nikos D. Sidiropoulos,et al.  A parallel algorithm for big tensor decomposition using randomly compressed cubes (PARACOMP) , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[31]  A. Cichocki,et al.  Block decomposition for very large-scale nonnegative tensor factorization , 2009, 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[32]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[33]  Derry Tanti Wijaya,et al.  Read the Web , 2014 .

[34]  K. Selçuk Candan,et al.  Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient tensor decomposition , 2012, CIKM.

[35]  Jimeng Sun,et al.  MultiVis: Content-Based Social Network Exploration through Multi-way Visual Analysis , 2009, SDM.

[36]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[37]  Christos Faloutsos,et al.  FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop , 2014, SDM.

[38]  Tamara G. Kolda,et al.  Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..

[39]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[40]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[41]  Nikos D. Sidiropoulos,et al.  Co-clustering as multilinear decomposition with sparse latent factors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Qiang Zhang,et al.  A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data , 2009, ICCS.

[43]  Michael W. Berry,et al.  Discussion Tracking in Enron Email using PARAFAC. , 2008 .