Matrix factorizations at scale: A comparison of scientific data analytics in spark and C+MPI using three case studies

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to 1.6TB particle physics, 2.2TB and 16TB climate modeling and 1.1TB bioimaging data. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance.

[1]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[2]  V. Rokhlin,et al.  A randomized algorithm for the approximation of matrices , 2006 .

[3]  Allen D. Malony,et al.  Scaling Spark on HPC Systems , 2016, HPDC.

[4]  Meng Xiangguo,et al.  Design, Synthesis and Dipeptidyl Peptidase 4 Inhibition of Novel Aminomethyl Biaryl Derivatives , 2017 .

[5]  G F Cao,et al.  Improved measurement of the reactor antineutrino flux and spectrum at Daya Bay , 2014, Physical review letters.

[6]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[7]  Prabhat,et al.  The effect of horizontal resolution on simulation quality in the Community Atmospheric Model, CAM5.1 , 2014 .

[8]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[9]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[10]  Matei Zaharia,et al.  Matrix Computations and Optimization in Apache Spark , 2015, KDD.

[11]  Prabhat,et al.  Identifying important ions and positions in mass spectrometry imaging data using CUR matrix decompositions. , 2015, Analytical chemistry.

[12]  Michael W. Mahoney,et al.  PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations , 2007, PLoS genetics.

[13]  A. Szalay,et al.  OBJECTIVE IDENTIFICATION OF INFORMATIVE WAVELENGTH REGIONS IN GALAXY SPECTRA , 2013, 1312.0637.

[14]  Uang,et al.  The NCEP Climate Forecast System Reanalysis , 2010 .

[15]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Prabhat,et al.  H 5 Spark : Bridging the I / O Gap between Spark and Scientific Data Formats on HPC Systems , 2016 .

[18]  C. Chui,et al.  Article in Press Applied and Computational Harmonic Analysis a Randomized Algorithm for the Decomposition of Matrices , 2022 .

[19]  Chao Liu,et al.  Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce , 2010, WWW '10.

[20]  Merico E. Argentati,et al.  Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in hypre and PETSc , 2007, SIAM J. Sci. Comput..

[21]  J. S. Lee,et al.  Non-negative matrix factorization of dynamic images in nuclear medicine , 2001, IEEE Nuclear Science Symposium Conference Record.

[22]  Richard B. Lehoucq,et al.  Anasazi software for the numerical solution of large-scale eigenvalue problems , 2009, TOMS.

[23]  D. Jacobsen,et al.  Contain This, Unleashing Docker for HPC , 2015 .

[24]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[25]  Willem J. Heiser,et al.  Two Purposes for Matrix Factorization: A Historical Appraisal , 2000, SIAM Rev..

[26]  Jordi Vitrià,et al.  Non-negative Matrix Factorization for Face Recognition , 2002, CCIA.

[27]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[28]  Nicolas Gillis,et al.  The Why and How of Nonnegative Matrix Factorization , 2014, ArXiv.

[29]  R. Larsen Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .

[30]  Vikas Sindhwani,et al.  Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization , 2012, ICML.

[31]  P. Paatero Least squares formulation of robust non-negative factor analysis , 1997 .

[32]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[33]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[34]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[35]  Haesun Park,et al.  Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework , 2014, J. Glob. Optim..

[36]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[37]  Nicolas Gillis,et al.  Hierarchical Clustering of Hyperspectral Images Using Rank-Two Nonnegative Matrix Factorization , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[38]  V. P. Pauca,et al.  Nonnegative matrix factorization for spectral data analysis , 2006 .

[39]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[40]  David F. Gleich,et al.  Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices , 2014, NIPS.

[41]  Yannis Sismanis,et al.  Sparkler: supporting large-scale matrix factorization , 2013, EDBT '13.

[42]  James Demmel,et al.  Reconstructing Householder Vectors from Tall-Skinny QR , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[43]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[44]  Inderjit S. Dhillon,et al.  NOMAD: Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion , 2013, Proc. VLDB Endow..

[45]  Danny C. Sorensen,et al.  Deflation Techniques for an Implicitly Restarted Arnoldi Iteration , 1996, SIAM J. Matrix Anal. Appl..

[46]  Gene H. Golub,et al.  Matrix computations , 1983 .

[47]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[48]  Shuigeng Zhou,et al.  CloudNMF: A MapReduce Implementation of Nonnegative Matrix Factorization for Large-scale Biological Datasets , 2014, Genom. Proteom. Bioinform..

[49]  Michael W. Mahoney Boyd,et al.  Randomized Algorithms for Matrices and Data , 2010 .

[50]  Rajeev Thakur,et al.  Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.

[51]  Haesun Park,et al.  A high-performance parallel algorithm for nonnegative matrix factorization , 2015, PPoPP.

[52]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[53]  Edgar Solomonik Provably Efficient Algorithms for Numerical Tensor Algebra , 2014 .

[54]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.