Efficient Static and Dynamic In-Database Tensor Decompositions on Chunk-Based Array Stores

As the relevant data sets get large, existing in-memory schemes for tensor decomposition become increasingly ineffective and, instead, memory-independent solutions, such as in-database analytics, are necessitated. In this paper, we present techniques for efficient implementations of in-database tensor decompositions on chunk-based array data stores. The proposed static and incremental in-database tensor decomposition operators and their optimizations address the constraints imposed by the main memory limitations when handling large and high-order tensor data. Firstly, we discuss how to implement alternating least squares operations efficiently on a chunk-based data storage system. Secondly, we consider scenarios with frequent data updates and show that compressed matrix multiplication techniques can be effective in reducing the incremental tensor decomposition maintenance costs. To the best of our knowledge, this paper presents the first attempt to develop efficient and optimized in-database tensor decomposition operations. We evaluate the proposed algorithms on tensor data sets that do not fit into the available memory and results show that the proposed techniques significantly improve the scalability of this core data analysis.

[1]  K. Selçuk Candan,et al.  Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient tensor decomposition , 2012, CIKM.

[2]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[3]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[4]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[5]  K. Selçuk Candan,et al.  Pushing-Down Tensor Decompositions over Unions to Promote Reuse of Materialized Decompositions , 2014, ECML/PKDD.

[6]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[7]  Peter Baumann,et al.  The multidimensional database system RasDaMan , 1998, SIGMOD '98.

[8]  Conrad Sanderson,et al.  Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments , 2010 .

[9]  Charalampos E. Tsourakakis MACH: Fast Randomized Tensor Decompositions , 2009, SDM.

[10]  Rasmus Pagh,et al.  Compressed matrix multiplication , 2011, ITCS '12.

[11]  Matthew Brand,et al.  Incremental Singular Value Decomposition of Uncertain Data with Missing Values , 2002, ECCV.

[12]  Narendra Ahuja,et al.  Out-of-core tensor approximation of multi-dimensional matrices of visual data , 2005, ACM Trans. Graph..

[13]  K. Selçuk Candan,et al.  Approximate tensor decomposition within a tensor-relational algebraic framework , 2011, CIKM '11.

[14]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable Tensor Decompositions , 2012, ECML/PKDD.

[15]  Andrzej Cichocki,et al.  CANDECOMP/PARAFAC Decomposition of High-Order Tensors Through Tensor Reshaping , 2012, IEEE Transactions on Signal Processing.

[16]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[17]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[18]  Philip S. Yu,et al.  Incremental tensor analysis: Theory and applications , 2008, TKDD.

[19]  Tamara G. Kolda,et al.  Scalable Tensor Decompositions for Multi-aspect Data Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[20]  Markku Hauta-Kasari,et al.  Nonnegative Tensor Factorization Accelerated Using GPGPU , 2011, IEEE Transactions on Parallel and Distributed Systems.

[21]  Andrzej Cichocki,et al.  PARAFAC algorithms for large-scale problems , 2011, Neurocomputing.

[22]  K. Selçuk Candan,et al.  Focusing Decomposition Accuracy by Personalizing Tensor Decomposition (PTD) , 2014, CIKM.

[23]  Rasmus Bro,et al.  Improving the speed of multiway algorithms: Part II: Compression , 1998 .

[24]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[25]  Leo Marco,et al.  Accuracy on a subset of the Extended Yale Face Database B. , 2014 .

[26]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[27]  Huan Liu,et al.  eTrust: understanding trust evolution in an online world , 2012, KDD.

[28]  K. Selçuk Candan,et al.  TensorDB: In-Database Tensor Manipulation with Tensor-Relational Query Plans , 2014, CIKM.

[29]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..