Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI

The integration of Big Data frameworks and HPC capabilities has drawn enormous interests in recent years. SHMEMCache is a distributed key-value store built on the OpenSHMEM global address space. It has solved several practical issues in leveraging OpenSHMEM’s one-sided operations for a distributed key-value store and providing efficient key-value operations on both commodity machines and supercomputers. However, being based solely on OpenSHMEM, SHMEMCache cannot leverage one-sided operations from a variety of software packages. This results in several limitations for SHMEMCache. First, we cannot make SHMEMCache available to a wider range of platforms. Second, an opportunity for potential performance improvement is missed. Third, there is a lack of deep understanding about how different one-sided operations can fit in with SHMEMCache and other distributed key-values in general. For example, the one-sided operations in OpenSHMEM and MPI have many differences in their interfaces, memory semantics and synchronization methods, all of which can have distinct implications and also increase the complexity in supporting both OpenSHMEM and MPI for SHMEMCache. Therefore, we have taken on an effort on leveraging different one-sided operations for SHMEMCache and proposed a design of portable SHMEMCache. Based on this new framework, we have supported both OpenSHMEM and MPI for SHMEMCache. We have also conducted an extensive set of experiments to compare the performance of the two versions on both commodity machines and the Titan supercomputer.

[1]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[2]  William Gropp,et al.  MPI-2: Extending the Message-Passing Interface , 1996, Euro-Par, Vol. I.

[3]  Barbara M. Chapman,et al.  Introducing OpenSHMEM: SHMEM for the PGAS community , 2010, PGAS '10.

[4]  Dhabaleswar K. Panda,et al.  High performance RDMA-based MPI implementation over InfiniBand , 2003, ICS.

[5]  Weikuan Yu,et al.  SHMemCache: Enabling Memcached on the OpenSHMEM Global Address Model , 2016, OpenSHMEM.

[6]  Neena Imam,et al.  High-Performance Key-Value Store on OpenSHMEM , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[7]  Katherine Yelick,et al.  UPC Language Specifications V1.1.1 , 2003 .

[8]  Dhabaleswar K. Panda,et al.  Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[9]  Jeffery A Kuehn,et al.  OpenSHMEM Performance and Potential: A NPB Experimental Study , 2012 .

[10]  Dhabaleswar K. Panda,et al.  High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  Rajeev Thakur,et al.  An implementation and evaluation of the MPI 3.0 one‐sided communication interface , 2016, Concurr. Comput. Pract. Exp..

[12]  Li Zhang,et al.  HydraDB: a resilient RDMA-driven key-value middleware for in-memory cluster computing , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[14]  Dhabaleswar K. Panda,et al.  A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters , 2014, OpenSHMEM.

[15]  Jarek Nieplocha,et al.  Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit , 2006, Int. J. High Perform. Comput. Appl..

[16]  Manjunath Gorentla Venkata,et al.  Designing a High Performance OpenSHMEM Implementation Using Universal Common Communication Substrate as a Communication Middleware , 2014, OpenSHMEM.

[17]  Galen M. Shipman,et al.  Infiniband scalability in Open MPI , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[18]  Weikuan Yu,et al.  Hadoop acceleration through network levitated merge , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[19]  Hyun-Wook Jin,et al.  High performance MPI-2 one-sided communication over InfiniBand , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[20]  Rajeev Thakur,et al.  An Evaluation of Implementation Options for MPI One-Sided Communication , 2005, PVM/MPI.

[21]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[22]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[23]  Dilma Da Silva,et al.  Providing a cloud network infrastructure on a supercomputer , 2010, HPDC '10.

[24]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[25]  Barbara M. Chapman,et al.  Implementing OpenSHMEM Using MPI-3 One-Sided Communication , 2014, OpenSHMEM.

[26]  Haibo Chen,et al.  Fast and general distributed transactions using RDMA and HTM , 2016, EuroSys.

[27]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[28]  Dhabaleswar K. Panda,et al.  High-Performance Design of HBase with RDMA over InfiniBand , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[29]  Sayantan Sur,et al.  Memcached Design on High Performance RDMA Capable Interconnects , 2011, 2011 International Conference on Parallel Processing.