Bridging the Gap between HPC and Big Data frameworks

Apache Spark is a popular framework for data analytics with attractive features such as fault tolerance and interoperability with the Hadoop ecosystem. Unfortunately, many analytics operations in Spark are an order of magnitude or more slower compared to native implementations written with high performance computing tools such as MPI. There is a need to bridge the performance gap while retaining the benefits of the Spark ecosystem such as availability, productivity, and fault tolerance. In this paper, we propose a system for integrating MPI with Spark and analyze the costs and benefits of doing so for four distributed graph and machine learning applications. We show that offloading computation to an MPI environment from within Spark provides 3.1−17.7× speedups on the four sparse applications, including all of the overheads. This opens up an avenue to reuse existing MPI libraries in Spark with little effort.

[1]  Zhiwei Xu,et al.  DataMPI: Extending MPI to Hadoop-Like Big Data Computing , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[2]  Pradeep Dubey,et al.  GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..

[3]  Scott Shenker,et al.  Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[4]  George Karypis,et al.  A Medium-Grained Algorithm for Sparse Tensor Factorization , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[5]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[6]  Gagan Agrawal,et al.  A Framework for Elastic Execution of Existing MPI Programs , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[7]  James Demmel,et al.  Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies , 2016, ArXiv.

[8]  Pradeep Dubey,et al.  GraphPad: Optimized Graph Primitives for Parallel and Distributed Platforms , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[9]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[10]  Jack J. Dongarra,et al.  FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World , 2000, PVM/MPI.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[13]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[14]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[15]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[16]  Sivasankaran Rajamanickam,et al.  A Case Study of Complex Graph Analysis in Distributed Memory: Implementation and Optimization , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[17]  Vivek Sarkar,et al.  SWAT: A Programmable, In-Memory, Distributed, High-Performance Computing Platform , 2016, HPDC.

[18]  Bora Uçar,et al.  Scalable sparse tensor decompositions in distributed memory systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Martha Larson,et al.  TFMAP: optimizing MAP for top-n context-aware recommendation , 2012, SIGIR '12.

[20]  Brett W. Bader,et al.  The TOPHITS Model for Higher-Order Web Link Analysis∗ , 2006 .

[21]  G. Karypis,et al.  A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization , 2016 .

[22]  Pradeep Dubey,et al.  Navigating the maze of graph analytics frameworks using massive graph datasets , 2014, SIGMOD Conference.

[23]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[24]  Scott Klasky,et al.  Exploring Automatic, Online Failure Recovery for Scientific Applications at Extreme Scales , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  Peter Sanders,et al.  Thrill: High-performance algorithmic distributed batch data processing with C++ , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[26]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[27]  Matei Zaharia,et al.  Matrix Computations and Optimization in Apache Spark , 2015, KDD.

[28]  References , 1971 .

[29]  Judy Qiu,et al.  A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures , 2014, 2014 IEEE International Congress on Big Data.

[30]  George Karypis,et al.  L2Knng: Fast Exact K-Nearest Neighbor Graph Construction with L2-Norm Pruning , 2015, CIKM.

[31]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[32]  James Demmel,et al.  Matrix factorizations at scale: A comparison of scientific data analytics in spark and C+MPI using three case studies , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[33]  Davide Anguita,et al.  Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf , 2015, INNS Conference on Big Data.

[34]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.