Parallel Tensor Compression for Large-Scale Scientific Data

As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data, assuming double precision. By viewing the data as a dense five-way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 5000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed-memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.

[1]  J. Leeuw,et al.  Principal component analysis of three-mode data by means of alternating least squares algorithms , 1980 .

[2]  Stephen B. Pope,et al.  Empirical low-dimensional manifolds in composition space , 2012 .

[3]  Andrzej Cichocki,et al.  PARAFAC algorithms for large-scale problems , 2011, Neurocomputing.

[4]  Nikos D. Sidiropoulos,et al.  SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[5]  S. Etter,et al.  Parallel ALS Algorithm for the Hierarchical Tucker Representation S , 2015 .

[6]  Jacqueline H. Chen,et al.  Numerical and experimental investigation of turbulent DME jet flames , 2015 .

[7]  Robert A. van de Geijn,et al.  Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..

[8]  W. Marsden I and J , 2012 .

[9]  Bora Uçar,et al.  Scalable sparse tensor decompositions in distributed memory systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[10]  Martin D. Schatz,et al.  Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation , 2015 .

[11]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[12]  Jimeng Sun,et al.  An input-adaptive and in-place approach to dense tensor-times-matrix multiply , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Raf Vandebril,et al.  A New Truncation Strategy for the Higher-Order Singular Value Decomposition , 2012, SIAM J. Sci. Comput..

[14]  Tianfeng Lu,et al.  Direct numerical simulations of HCCI/SACI with ethanol , 2014 .

[15]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[16]  Mehran Yazdi,et al.  Compression of Hyperspectral Images Using Discerete Wavelet Transform and Tucker Decomposition , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[17]  Lars Karlsson,et al.  Parallel algorithms for tensor completion in the CP format , 2016, Parallel Comput..

[18]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[19]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[20]  Andrzej Cichocki,et al.  Decomposition of Big Tensors With Low Multilinear Rank , 2014, ArXiv.

[21]  Nedunchezhian Swaminathan,et al.  Velocity and Reactive Scalar Dissipation Spectra in Turbulent Premixed Flames , 2016 .

[22]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[23]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[24]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[25]  Renato Pajarola,et al.  Lossy volume compression using Tucker truncation and thresholding , 2016, The Visual Computer.