论文信息 - Accelerating Complex Data Transfer for Cluster Computing

Accelerating Complex Data Transfer for Cluster Computing

The ability to move data quickly between the nodes of a distributed system is important for the performance of cluster computing frameworks, such as Hadoop and Spark. We show that in a cluster with modern networking technology data serialization is the main bottleneck and source of overhead in the transfer of rich data in systems based on high-level programming languages such as Java. We propose a new data transfer mechanism that avoids serialization altogether by using a shared clusterwide address space to store data. The design and a prototype implementation of this approach are described. We show that our mechanism is significantly faster than serialized data transfer, and propose a number of possible applications for it.

Eyal de Lara | Alexy Khrabrov

[1] Kathryn S. McKinley,et al. Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[2] Sayantan Sur,et al. Memcached Design on High Performance RDMA Capable Interconnects , 2011, 2011 International Conference on Parallel Processing.

[3] Dhabaleswar K. Panda,et al. Accelerating Spark with RDMA for Big Data Processing: Early Experiences , 2014, 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects.

[4] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[5] Scott Shenker,et al. Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[6] Donald Miller,et al. Using a single address space operating system for distributed computing and high performance , 1999, 1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305).

[7] Johan Andersson. Kaffemik - a distributed JVM featuring a single address space , 2001, Java Virtual Machine Research and Technology Symposium.

[8] Miguel Castro,et al. FaRM: Fast Remote Memory , 2014, NSDI.