Deterministic matrix sketches for low-rank compression of high-dimensional simulation data

Matrices arising in scientific applications frequently admit linear low-rank approximations due to smoothness in the physical and/or temporal domain of the problem. In large-scale problems, computing an optimal low-rank approximation can be prohibitively expensive. Matrix sketching addresses this by reducing the input matrix to a smaller, but representative matrix via a low-dimensional linear embedding. If the sketch matrix produced by the embedding sufficiently captures the geometric properties of the original matrix, then a near-optimal approximation may be obtained. Much of the work done in matrix sketching has centered on random projection. Alternatively, in this work, deterministic matrix sketches which generate coarse representations – compatible with the corresponding PDE solve – are considered in the computation of the singular value decomposition and matrix interpolative decomposition. The deterministic sketching approaches in this work have many advantages over randomized sketches. Broadly, randomized sketches are data-agnostic, whereas the proposed sketching methods exploit structures within data generated in complex PDE systems. These deterministic sketches are often faster, require access to a small fraction of the input matrix, and do not need to be explicitly constructed. A novel single-pass, i.e., requiring one read over the input, power iteration algorithm is also presented. The power iteration method is particularly effective in improving low-rank approximations when the singular value decay of data is slow. Finally, theoretical error bounds and estimates, as well as numerical results across three application problems, are provided.

[1]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[2]  Franck Cappello,et al.  Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[3]  Yaroslav Shitov,et al.  Column subset selection is NP-complete , 2017, Linear Algebra and its Applications.

[4]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[5]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[6]  Yi Li,et al.  Data exploration of turbulence simulations using a database cluster , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[7]  Per-Gunnar Martinsson,et al.  On the Compression of Low Rank Matrices , 2005, SIAM J. Sci. Comput..

[8]  Jalaj Upadhyay,et al.  Fast and Space-optimal Low-rank Factorization in the Streaming Model With Application in Differential Privacy , 2016, ArXiv.

[9]  Joel A. Tropp,et al.  Improved Analysis of the subsampled Randomized Hadamard Transform , 2010, Adv. Data Sci. Adapt. Anal..

[10]  Per-Gunnar Martinsson,et al.  Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.

[11]  Christos Boutsidis,et al.  Optimal principal component analysis in distributed and streaming models , 2015, STOC.

[12]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[13]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[14]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[15]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[16]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[17]  Kenneth E. Jansen,et al.  Reduced-Basis Multifidelity Approach for Efficient Parametric Study of NACA Airfoils , 2019, AIAA Journal.

[18]  Akil C. Narayan,et al.  Practical error bounds for a non-intrusive bi-fidelity approach to parametric/stochastic model reduction , 2018, J. Comput. Phys..

[19]  Alex Gittens,et al.  Error Bounds for Random Matrix Approximation Schemes , 2009, 0911.4108.

[20]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[21]  V. Rokhlin,et al.  A fast randomized algorithm for the approximation of matrices ✩ , 2007 .

[22]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[23]  Peter Lindstrom,et al.  Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[24]  Franck Cappello,et al.  Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[25]  Yaohang Li,et al.  Single-Pass PCA of Large High-Dimensional Data , 2017, IJCAI.

[26]  John Kim,et al.  DIRECT NUMERICAL SIMULATION OF TURBULENT CHANNEL FLOWS UP TO RE=590 , 1999 .

[27]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[28]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[29]  Gianluca Iaccarino,et al.  Pass-efficient methods for compression of high-dimensional turbulent flow data , 2020, J. Comput. Phys..

[30]  Volkan Cevher,et al.  Practical Sketching Algorithms for Low-Rank Matrix Approximation , 2016, SIAM J. Matrix Anal. Appl..

[31]  P. Spalart A One-Equation Turbulence Model for Aerodynamic Flows , 1992 .

[32]  David P. Woodru Sketching as a Tool for Numerical Linear Algebra , 2014 .

[33]  Edo Liberty,et al.  Simple and deterministic matrix sketching , 2012, KDD.

[34]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[35]  Kenneth E. Jansen,et al.  An Evaluation of Multi-Fidelity Modeling Efficiency on a Parametric Study of NACA Airfoils , 2017 .

[36]  Martin Isenburg,et al.  Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[37]  Ming Gu,et al.  Subspace Iteration Randomization and Singular Value Problems , 2014, SIAM J. Sci. Comput..

[38]  Franck Cappello,et al.  Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[39]  David P. Woodruff,et al.  Frequent Directions: Simple and Deterministic Matrix Sketching , 2015, SIAM J. Comput..

[40]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[41]  Per-Gunnar Martinsson,et al.  Randomized methods for matrix computations , 2016, IAS/Park City Mathematics Series.

[42]  Yi Li,et al.  A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence , 2008, 0804.1703.

[43]  Gianluca Iaccarino,et al.  A scalable geometric multigrid solver for nonsymmetric elliptic systems with application to variable-density flows , 2018, J. Comput. Phys..

[44]  P. Hansen The truncatedSVD as a method for regularization , 1987 .

[45]  Sanjeev Arora,et al.  A Fast Random Sampling Algorithm for Sparsifying Matrices , 2006, APPROX-RANDOM.

[46]  R. Samworth,et al.  Random‐projection ensemble classification , 2015, 1504.04595.

[47]  Hank Childs,et al.  Data Reduction Techniques for Simulation, Visualization and Data Analysis , 2018, Comput. Graph. Forum.

[48]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .