论文信息 - Supporting input dependent access pattern algorithms on GPUs using GPUfs

Supporting input dependent access pattern algorithms on GPUs using GPUfs

Accelerating processing of very large datasets on GPUs is challenging, in particular when algorithms exhibit unpredictable data access patterns. In this paper we investigate the utility of GPUfs, a library that provides direct access to files from GPU programs, to implement such algorithms. We analyze the system’s bottlenecks, and suggest several modification to the GPUfs design, including new concurrent hash table for the buffer cache and a highly parallel memory allocator. We evaluate our changes by implementing a real image processing application which creates collages from a dataset of 2 Million images. The enhanced GPUfs design improves the application performance by 2× over the original GPUfs and outperforms both 12-core parallel CPU and standard CUDA-based GPU implementations, while significantly simplifying GPU application design.

Mark Silberstein | Sagi Shahar

[1] Feng Ji,et al. RSVM: A Region-based Software Virtual Memory for GPU , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[2] Idit Keidar,et al. GPUfs: Integrating a file system with GPUs , 2013, TOCS.

[3] Scott A. Mahlke,et al. VAST: The illusion of a large memory space for GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[4] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[5] Dinesh Manocha,et al. Fast GPU-based locality sensitive hashing for k-nearest neighbor computation , 2011, GIS.

[6] Sebastian Michel,et al. RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce , 2010, LSDS-IR@SIGIR.

[7] Maurice Herlihy,et al. A Lazy Concurrent List-Based Set Algorithm , 2007, Parallel Process. Lett..

[8] M. Frans Kaashoek,et al. RadixVM: scalable address spaces for multithreaded applications , 2013, EuroSys '13.

[9] Miles Osborne,et al. Streaming First Story Detection with application to Twitter , 2010, NAACL.

[10] Akshat Verma,et al. Shredder: GPU-accelerated incremental storage and computation , 2012, FAST.

[11] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.