Hippogriff: Efficiently moving data in heterogeneous computing systems

Data movement between the compute and the storage (e.g., GPU and SSD) has been a long-neglected problem in heterogeneous systems, while the inefficiency in existing systems does cause significant loss in both performance and energy efficiency. This paper presents Hippogriff to provide a high-level programming model to simplify data movement between the compute and the storage, and to dynamically schedule data transfers based on system load. By eliminating unnecessary data movement, Hippogriff can speedup single program workloads by 1.17×, and save 17% energy. For multi-program workloads, Hippogriff shows 1.25× speedup. Hippogriff also improves the performance of a GPU-based MapReduce framework by 27%.

[1]  Shinpei Kato,et al.  Zero-copy I/O processing for low-latency GPU computing , 2013, 2013 ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS).

[2]  Trevor N. Mudge,et al.  Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008, 2008 International Symposium on Computer Architecture.

[3]  Myoungsoo Jung,et al.  GPUdrive: Reconsidering Storage Accesses for GPU Acceleration , 2014 .

[4]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[5]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[6]  Steven Swanson,et al.  Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications , 2009, ASPLOS.

[7]  Alessandro Forin,et al.  Direct GPU/FPGA communication Via PCI express , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[8]  John Shalf,et al.  NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[9]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[10]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[11]  John D. Owens,et al.  Multi-GPU MapReduce on GPU Clusters , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[12]  Karthikeyan Sankaralingam,et al.  Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).