Pushing-Down Tensor Decompositions over Unions to Promote Reuse of Materialized Decompositions

From data collection to decision making, the life cycle of data often involves many steps of integration, manipulation, and analysis. To be able to provide end-to-end support for the full data life cycle, today's data management and decision making systems increasingly combine operations for data manipulation, integration as well as data analysis. Tensor-relational model (TRM) is a framework proposed to support both relational algebraic operations (for data manipulation and integration) and tensor algebraic operations (for data analysis). In this paper, we consider joint processing of relational algebraic and tensor analysis operations. In particular, we focus on data processing workflows that involve data integration from multiple sources (through unions) and tensor decomposition tasks. While, in traditional relational algebra, the costliest operation is known to be the join, in a framework that provides both relational and tensor operations, tensor decomposition tends to be the computationally costliest operation. Therefore, it is most critical to reduce the cost of the tensor decomposition task by manipulating the data processing workflow in a way that reduces the cost of the tensor decomposition step. Therefore, in this paper, we consider data processing workflows involving tensor decomposition and union operations and we propose a novel scheme for pushing down the tensor decompositions over the union operations to reduce the overall data processing times and to promote reuse of materialized tensor decomposition results. Experimental results confirm the efficiency and effectiveness of the proposed scheme.

[1]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[2]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[3]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[4]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[5]  Bruce R. Kowalski,et al.  Generalized rank annihilation factor analysis , 1986 .

[6]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[7]  Tamara G. Kolda,et al.  Scalable Tensor Decompositions for Multi-aspect Data Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[8]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.

[9]  Lieven De Lathauwer,et al.  Optimization-Based Algorithms for Tensor Decompositions: Canonical Polyadic Decomposition, Decomposition in Rank-(Lr, Lr, 1) Terms, and a New Generalization , 2013, SIAM J. Optim..

[10]  Tamara G. Kolda,et al.  Higher-order Web link analysis using multilinear algebra , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[11]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[12]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable Tensor Decompositions , 2012, ECML/PKDD.

[13]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[14]  Tamara G. Kolda,et al.  MATLAB Tensor Toolbox , 2006 .

[15]  B. Kowalski,et al.  Tensorial resolution: A direct trilinear decomposition , 1990 .

[16]  K. Selçuk Candan,et al.  Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient tensor decomposition , 2012, CIKM.

[17]  Martin L. Kersten,et al.  Distribution Rules for Array Database Queries , 2005, DEXA.

[18]  Huan Liu,et al.  eTrust: understanding trust evolution in an online world , 2012, KDD.

[19]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[20]  Andrzej Cichocki,et al.  PARAFAC algorithms for large-scale problems , 2011, Neurocomputing.

[21]  Peter Baumann,et al.  The multidimensional database system RasDaMan , 1998, SIGMOD '98.

[22]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[23]  K. Selçuk Candan,et al.  Approximate tensor decomposition within a tensor-relational algebraic framework , 2011, CIKM '11.

[24]  Niklaus Wirth,et al.  Algorithms and Data Structures , 1989, Lecture Notes in Computer Science.

[25]  Alexander S. Szalay,et al.  Array requirements for scientific applications and an implementation for microsoft SQL server , 2011, AD '11.

[26]  Charalampos E. Tsourakakis MACH: Fast Randomized Tensor Decompositions , 2009, SDM.