H-PARAFAC: Hierarchical Parallel Factor Analysis of Multidimensional Big Data

It has long been an important issue in various disciplines to examine massive multidimensional data superimposed by a high level of noises and interferences by extracting the embedded multi-way factors. With the quick increases of data scales and dimensions in the big data era, research challenges arise in order to (1) reflect the dynamics of large tensors while introducing no significant distortions in the factorization procedure and (2) handle influences of the noises in sophisticated applications. A hierarchical parallel processing framework over a GPU cluster, namely H-PARAFAC, has been developed to enable scalable factorization of large tensors upon a “divide-and-conquer” theory for Parallel Factor Analysis (PARAFAC). The H-PARAFAC framework incorporates a coarse-grained model for coordinating the processing of sub-tensors and a fine-grained parallel model for computing each sub-tensor and fusing sub-factors. Experimental results indicate that (1) the proposed method breaks the limitation on the scale of multidimensional data to be factorized and dramatically outperforms the traditional counterparts in terms of both scalability and efficiency, e.g., the runtime increases in the order of <inline-formula> <tex-math notation="LaTeX">$n^2$</tex-math><alternatives><inline-graphic xlink:href="wang-ieq1-2613054.gif"/> </alternatives></inline-formula> when the data volume increases in the order of <inline-formula> <tex-math notation="LaTeX">$n^3$</tex-math><alternatives><inline-graphic xlink:href="wang-ieq2-2613054.gif"/> </alternatives></inline-formula>, (2) H-PARAFAC has potentials in refraining the influences of significant noises, and (3) H-PARAFAC is far superior to the conventional window-based counterparts in preserving the features of multiple modes of large tensors.

[1]  Samee U. Khan,et al.  Fast and Scalable Multiway Analysis of Neural Data , 2013 .

[2]  Andrzej Cichocki,et al.  PARAFAC algorithms for large-scale problems , 2011, Neurocomputing.

[3]  Nikos D. Sidiropoulos,et al.  SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[4]  Pingkun Yan,et al.  Image Super-Resolution Via Double Sparsity Regularized Manifold Learning , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Chengbiao Lu,et al.  Characteristics of Evoked Potential Multiple EEG Recordings in Patients with Chronic Pain by Means of Parallel Factor Analysis , 2012, Comput. Math. Methods Medicine.

[6]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[7]  Pierre Comon,et al.  Enhanced Line Search: A Novel Method to Accelerate PARAFAC , 2008, SIAM J. Matrix Anal. Appl..

[8]  Andrzej Cichocki,et al.  Fast Alternating LS Algorithms for High Order CANDECOMP/PARAFAC Tensor Factorizations , 2013, IEEE Transactions on Signal Processing.

[9]  emanuele lombardi,et al.  Modelling and simulation of complex systems , 2012 .

[10]  Andrzej Cichocki,et al.  CANDECOMP/PARAFAC Decomposition of High-Order Tensors Through Tensor Reshaping , 2012, IEEE Transactions on Signal Processing.

[11]  Didier G. Leibovici,et al.  Multi-way modelling of high-dimensionality electroencephalographic data , 2001 .

[12]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[13]  Xiaofeng Gong,et al.  Tensor decomposition of EEG signals: A brief review , 2015, Journal of Neuroscience Methods.

[14]  Tapani Ristaniemi,et al.  Low-rank Approximation Based non-Negative Multi-Way Array Decomposition on Event-Related potentials , 2014, Int. J. Neural Syst..

[15]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[16]  Andrzej Cichocki,et al.  Advances in PARAFAC Using Parallel Block Decomposition , 2009, ICONIP.

[17]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[18]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[19]  J. H. Choi,et al.  DFacTo: Distributed Factorization of Tensors , 2014, NIPS.

[20]  Zbynek Koldovský,et al.  Cramér-Rao-Induced Bounds for CANDECOMP/PARAFAC Tensor Decomposition , 2012, IEEE Transactions on Signal Processing.

[21]  Lars Kai Hansen,et al.  Parallel Factor Analysis as an exploratory tool for wavelet transformed event-related EEG , 2006, NeuroImage.

[22]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[23]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[24]  P. J. Narayanan,et al.  Singular value decomposition on GPU using CUDA , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[25]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable Tensor Decompositions , 2012, ECML/PKDD.

[26]  R. Harshman,et al.  PARAFAC: parallel factor analysis , 1994 .

[27]  Heikki Lyytinen,et al.  Benefits of Multi-Domain Feature of mismatch Negativity Extracted by Non-Negative Tensor Factorization from EEG Collected by Low-Density Array , 2012, Int. J. Neural Syst..

[28]  Andrzej Cichocki,et al.  Fast Nonnegative Matrix/Tensor Factorization Based on Low-Rank Approximation , 2012, IEEE Transactions on Signal Processing.

[29]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[30]  Ruimin Hu,et al.  Facial Image Hallucination Through Coupled-Layer Neighbor Embedding , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Peng Liu,et al.  Link the remote sensing big data to the image features via wavelet transformation , 2016, Cluster Computing.

[32]  Gerrit Kateman,et al.  Generalized rank annihilation method. I: Derivation of eigenvalue problems , 1994 .

[33]  Byron M. Yu,et al.  Dimensionality reduction for large-scale neural recordings , 2014, Nature Neuroscience.

[34]  Andrzej Cichocki,et al.  Hierarchical ALS Algorithms for Nonnegative Matrix and 3D Tensor Factorization , 2007, ICA.

[35]  Lizhe Wang,et al.  Fast and Scalable Multi-Way Analysis of Massive Neural Data , 2015, IEEE Transactions on Computers.

[36]  Pingkun Yan,et al.  Alternatively Constrained Dictionary Learning For Image Superresolution , 2014, IEEE Transactions on Cybernetics.

[37]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable CANDECOMP-PARAFAC Tensor Decomposition , 2015, ACM Trans. Knowl. Discov. Data.