XSD: Accelerating MapReduce by Harnessing the GPU inside an SSD

Considerable research has been conducted recently on near-data processing techniques as real-world tasks increasingly involve large-scale and high-dimensional data sets. The advent of solid-state drives (SSDs) has spurred further research because of their processing capability and high internal bandwidth. However, the data processing capability of conventional SSD systems have not been impressive. In particular, they lack the parallel processing abilities that are crucial for data-centric workloads and that are needed to exploit the high internal bandwidth of the SSD. To overcome these shortcomings, we propose a new SSD architecture that integrates a graphics processing unit (GPU). We provide API sets based on the MapReduce framework that allow users to express parallelism in their application, and that exploit the parallelism provided by the embedded GPU. For better performance and utilization, we present optimization strategies to overcome challenges inherent in the SSD architecture. A performance model is also developed that provides an analytical way to tune the SSD design. Our experimental results show that the proposed XSD is approximately 25 times faster compared to an SSD model incorporating a high-performance embedded CPU and up to 4 times faster than a model incorporating a discrete GPU.

[1]  M. Snir,et al.  Big data, but are we ready? , 2011, Nature Reviews Genetics.

[2]  Christoforos E. Kozyrakis,et al.  Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[3]  David J. DeWitt,et al.  Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[4]  Chanik Park,et al.  Active disk meets flash: a case for intelligent SSDs , 2013, ICS '13.

[5]  Christos Faloutsos,et al.  Active Storage for Large-Scale Data Mining and Multimedia , 1998, VLDB.

[6]  Chanik Park,et al.  Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Bingsheng He,et al.  Mars: Accelerating MapReduce with Graphics Processors , 2011, IEEE Transactions on Parallel and Distributed Systems.

[10]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[11]  Hao Wang,et al.  Workload and power budget partitioning for single-chip heterogeneous processors , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[13]  Wenguang Chen,et al.  MapCG: Writing parallel program portable between CPU and GPU , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .