Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions

For many multi-dimensional data applications, tensor operations as well as relational operations both need to be supported throughout the data lifecycle. Tensor based representations (including two widely used tensor decompositions, CP and Tucker decompositions) are proven to be effective in multi-aspect data analysis and tensor decomposition is an important tool for capturing high-order structures in multi-dimensional data. Although tensor decomposition is shown to be effective for multi-dimensional data analysis, the cost of tensor decomposition is often very high. Since the number of modes of the tensor data is one of the main factors contributing to the costs of the tensor operations, in this paper, we focus on reducing the modality of the input tensors to tackle the computational cost of the tensor decomposition process. We propose a novel decomposition-by-normalization scheme that first normalizes the given relation into smaller tensors based on the functional dependencies of the relation, decomposes these smaller tensors, and then recombines the sub-results to obtain the overall decomposition. The decomposition and recombination steps of the decomposition-by-normalization scheme fit naturally in settings with multiple cores. This leads to a highly efficient, effective, and parallelized decomposition-by-normalization algorithm for both dense and sparse tensors for CP and Tucker decompositions. Experimental results confirm the efficiency and effectiveness of the proposed decomposition-by-normalization scheme compared to the conventional nonnegative CP decomposition and Tucker decomposition approaches.

[1]  Charalampos E. Tsourakakis MACH: Fast Randomized Tensor Decompositions , 2009, SDM.

[2]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[3]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[4]  Heikki Mannila,et al.  On the Complexity of Inferring Functional Dependencies , 1992, Discret. Appl. Math..

[5]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[6]  Rasmus Bro,et al.  The N-way Toolbox for MATLAB , 2000 .

[7]  Bruce R. Kowalski,et al.  Generalized rank annihilation factor analysis , 1986 .

[8]  Petros Drineas,et al.  Tensor-CUR decompositions for tensor-based data , 2006, KDD '06.

[9]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[10]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[11]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[12]  Wei Chu,et al.  Probabilistic Models for Incomplete Multi-dimensional Arrays , 2009, AISTATS.

[13]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[14]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[15]  Qiang Zhang,et al.  A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data , 2009, ICCS.

[16]  Tamara G. Kolda,et al.  Higher-order Web link analysis using multilinear algebra , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[17]  Jimeng Sun,et al.  Two heads better than one: pattern discovery in time-evolving multi-aspect data , 2008, Data Mining and Knowledge Discovery.

[18]  Philip S. Yu,et al.  Incremental tensor analysis: Theory and applications , 2008, TKDD.

[19]  Zhaoshui He,et al.  Canonical Polyadic Decomposition: From 3-way to N-Way , 2012, 2012 Eighth International Conference on Computational Intelligence and Security.

[20]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[21]  Richard M. Karp,et al.  The Differencing Method of Set Partitioning , 1983 .

[22]  Peter D. Hoff,et al.  Hierarchical multilinear models for multiway data , 2010, Comput. Stat. Data Anal..

[23]  Mechthild Stoer,et al.  A simple min-cut algorithm , 1997, JACM.

[24]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[25]  B. Kowalski,et al.  Tensorial resolution: A direct trilinear decomposition , 1990 .

[26]  Zenglin Xu,et al.  Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis , 2011, ICML.

[27]  Andrzej Cichocki,et al.  PARAFAC algorithms for large-scale problems , 2011, Neurocomputing.

[28]  Markku Hauta-Kasari,et al.  Nonnegative Tensor Factorization Accelerated Using GPGPU , 2011, IEEE Transactions on Parallel and Distributed Systems.

[29]  Tamara G. Kolda,et al.  Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..

[30]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[31]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[32]  Genevera I. Allen,et al.  Sparse Higher-Order Principal Components Analysis , 2012, AISTATS.

[33]  Tamara G. Kolda,et al.  Scalable Tensor Decompositions for Multi-aspect Data Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[34]  Jimeng Sun,et al.  MultiVis: Content-Based Social Network Exploration through Multi-way Visual Analysis , 2009, SDM.

[35]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[36]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[37]  K. Selçuk Candan,et al.  Approximate tensor decomposition within a tensor-relational algebraic framework , 2011, CIKM '11.