Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance.

[1]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[2]  P. Paatero Least squares formulation of robust non-negative factor analysis , 1997 .

[3]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[4]  V. Rokhlin,et al.  A randomized algorithm for the approximation of matrices , 2006 .

[5]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[6]  C. Chui,et al.  Article in Press Applied and Computational Harmonic Analysis a Randomized Algorithm for the Decomposition of Matrices , 2022 .

[7]  Chao Liu,et al.  Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce , 2010, WWW '10.

[8]  Vikas Sindhwani,et al.  Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization , 2012, ICML.

[9]  Allen D. Malony,et al.  Scaling Spark on HPC Systems , 2016, HPDC.

[10]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[11]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[12]  Merico E. Argentati,et al.  Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in hypre and PETSc , 2007, SIAM J. Sci. Comput..

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  I. Jolliffe Principal Component Analysis , 2002 .

[15]  Nicolas Gillis,et al.  Hierarchical Clustering of Hyperspectral Images Using Rank-Two Nonnegative Matrix Factorization , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[16]  Gene H. Golub,et al.  Matrix computations , 1983 .

[17]  Prabhat,et al.  Identifying important ions and positions in mass spectrometry imaging data using CUR matrix decompositions. , 2015, Analytical chemistry.

[18]  David F. Gleich,et al.  Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices , 2014, NIPS.

[19]  Yannis Sismanis,et al.  Sparkler: supporting large-scale matrix factorization , 2013, EDBT '13.

[20]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[21]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[22]  Shuigeng Zhou,et al.  CloudNMF: A MapReduce Implementation of Nonnegative Matrix Factorization for Large-scale Biological Datasets , 2014, Genom. Proteom. Bioinform..

[23]  Inderjit S. Dhillon,et al.  NOMAD: Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion , 2013, Proc. VLDB Endow..

[24]  J. S. Lee,et al.  Non-negative matrix factorization of dynamic images in nuclear medicine , 2001, IEEE Nuclear Science Symposium Conference Record.

[25]  Richard B. Lehoucq,et al.  Anasazi software for the numerical solution of large-scale eigenvalue problems , 2009, TOMS.

[26]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[27]  Nicolas Gillis,et al.  The Why and How of Nonnegative Matrix Factorization , 2014, ArXiv.

[28]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[29]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[30]  Matei Zaharia,et al.  Matrix Computations and Optimization in Apache Spark , 2015, KDD.

[31]  Willem J. Heiser,et al.  Two Purposes for Matrix Factorization: A Historical Appraisal , 2000, SIAM Rev..

[32]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[33]  Prabhat,et al.  The effect of horizontal resolution on simulation quality in the Community Atmospheric Model, CAM5.1 , 2014 .

[34]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[35]  Uang,et al.  The NCEP Climate Forecast System Reanalysis , 2010 .

[36]  Jordi Vitrià,et al.  Non-negative Matrix Factorization for Face Recognition , 2002, CCIA.

[37]  James Demmel,et al.  Reconstructing Householder Vectors from Tall-Skinny QR , 2014, IPDPS.

[38]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[39]  R. Larsen Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .

[40]  Rajeev Thakur,et al.  Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.

[41]  Haesun Park,et al.  A high-performance parallel algorithm for nonnegative matrix factorization , 2015, PPoPP.

[42]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..