We report our experiences in porting and tuning the Apache Spark data analytics framework on the Cray XC30 (Edison) and XC40 (Cori) systems, installed at NERSC. Spark has been designed for cloud environments where local disk I/O is cheap and performance is constrained by the network latency. In large HPC systems diskless nodes are connected by fast networks: without careful tuning Spark execution is dominated by I/O performance. In default mode the centralized storage system, such as Lustre, results in metadata access latency being a major bottleneck that severely constrains scalability. We show how to mitigate this by using per-node loopback filesystems for temporary storage. With this technique, we reduce the communication (data shuffle) time by multiple orders of magnitude and improve the application scalability from O(100) to O(10, 000) cores on Cori. With this configuration Spark’s execution becomes again network dominated. This reflects in the performance comparison with a cluster with fast local SSDs, specifically designed for data intensive workloads. Due to slightly faster processor and better network, Cori provides performance better by an average of 13.7% for the machine learning benchmark suite. This is the first such result where HPC systems outperform systems designed for data intensive workloads. Overall, we believe this paper demonstrates that local disks are not necessary for good performance on data analytics workloads. Keywords-Spark; Berkeley Data Analytics Stack; Cray XC; Lustre; Shifter
[1]
Michael J. Franklin,et al.
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
,
2012,
NSDI.
[2]
Reynold Xin,et al.
GraphX: Graph Processing in a Distributed Dataflow Framework
,
2014,
OSDI.
[3]
Kristyn J. Maschhoff,et al.
Experiences Running and Optimizing the Berkeley Data Analytics Stack on Cray Platforms
,
2015
.
[4]
Michael J. Franklin.
Making sense of big data with the Berkeley data analytics stack
,
2013,
SSDBM.
[5]
Hairong Kuang,et al.
The Hadoop Distributed File System
,
2010,
2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[6]
Scott Shenker,et al.
Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters
,
2012,
HotCloud.
[7]
Allen D. Malony,et al.
Scaling Spark on HPC Systems
,
2016,
HPDC.
[8]
Scott Shenker,et al.
Spark: Cluster Computing with Working Sets
,
2010,
HotCloud.
[9]
D. Jacobsen,et al.
Contain This, Unleashing Docker for HPC
,
2015
.
[10]
Carlo Curino,et al.
Apache Hadoop YARN: yet another resource negotiator
,
2013,
SoCC.
[11]
S. Krishnan.
myHadoop-Hadoop-on-Demand on Traditional HPC Resources
,
2004
.