A Multi-Platform Evaluation of the Randomized CX Low-Rank Matrix Factorization in Spark

We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with the fastest times obtained on the experimental Cray cluster. In comparison, the C implementation processed the 1TB size dataset 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.

[1]  C. Chui,et al.  Article in Press Applied and Computational Harmonic Analysis a Randomized Algorithm for the Decomposition of Matrices , 2022 .

[2]  Michael W. Mahoney,et al.  PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations , 2007, PLoS genetics.

[3]  A. Szalay,et al.  OBJECTIVE IDENTIFICATION OF INFORMATIVE WAVELENGTH REGIONS IN GALAXY SPECTRA , 2013, 1312.0637.

[4]  D. Roweth,et al.  Cray XC ® Series Network , 2012 .

[5]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[6]  Mike Higgins,et al.  Cray Cascade: A scalable HPC system based on a Dragonfly network , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Gene H. Golub,et al.  Matrix computations , 1983 .

[8]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[9]  Michael W. Mahoney Boyd,et al.  Randomized Algorithms for Matrices and Data , 2010 .

[10]  Michael W. Mahoney,et al.  Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments , 2015, Proceedings of the IEEE.

[11]  Prabhat,et al.  Identifying important ions and positions in mass spectrometry imaging data using CUR matrix decompositions. , 2015, Analytical chemistry.

[12]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[13]  V. Rokhlin,et al.  A randomized algorithm for the approximation of matrices , 2006 .

[14]  Pradeep Dubey,et al.  Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms , 2015, ISC.

[15]  James Demmel,et al.  Communication optimal parallel multiplication of sparse random matrices , 2013, SPAA.

[16]  Vipin Kumar UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS , 2006 .

[17]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[18]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[19]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[20]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..