MDHIM: A Parallel Key/Value Framework for HPC

The long-expected convergence of High Performance Computing and Big Data Analytics is upon us. Unfortunately, the computing environments created for each workload are not necessarily conducive for the other. In this paper, we evaluate the ability of traditional high performance computing architectures to run big data analytics. We discover and describe limitations which prevent the seamless utilization of existing big data analytics tools and software. Specifically, we evaluate the effectiveness of distributed key-value stores for manipulating large data sets across tightly coupled parallel supercomputers. Although existing distributed key-value stores have proven highly effective in cloud environments, we find their performance on HPC clusters to be degraded. Accordingly, we have built an HPC specific key-value stored called the Multi-Dimensional Hierarchical Indexing Middleware (MDHIM). Using standard big data benchmarks we find that MDHIM performance more than triples that of Cassandra on HPC systems.

[1]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[2]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[3]  Doubletree Hotel San Jose,et al.  The World's Most Popular Open Source Database , 2003 .

[4]  Moni Naor,et al.  Job Scheduling Strategies for Parallel Processing , 2017, Lecture Notes in Computer Science.

[5]  Andrea C. Arpaci-Dusseau,et al.  Pipeline and batch sharing in grid workloads , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[6]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[7]  Sorin Faibish,et al.  Jitter-free co-processing on a prototype exascale storage stack , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[8]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[9]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[10]  Carlos Maltzahn,et al.  I/O acceleration with pattern detection , 2013, HPDC.

[11]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[12]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.