Hardware support for Local Memory Transactions on GPU Architectures

Graphics Processing Units (GPUs) are popular hardware accelerators for data-parallel applications, enabling the execution of thousands of threads in a Single Instruction Multiple Thread (SIMT) fashion. However, the SIMT execution model is not efficient when code includes critical sections to protect the access to data shared by the running threads. In addition, GPUs offer two shared spaces to the threads, local memory and global memory. Typical solutions to thread synchronization include the use of atomics to implement locks, the serialization of the execution of the critical section, or delegating the execution of the critical section to the host CPU, leading to suboptimal performance. In the multi-core CPU world, transactional memory (TM) was proposed as an alternative to locks to coordinate concurrent threads. Some solutions for GPUs started to appear in the literature. In contrast to these earlier proposals, our approach is to design hardware support for TM in two levels. The first level is a fast and lightweight solution for coordinating threads that share the local memory, while the second level coordinates threads through the global memory. In this paper we present GPU-LocalTM as a hardware TM (HTM) support for the first level. GPU-LocalTM offers simple conflict detection and version management mechanisms that minimize the hardware resources required for its implementation. For the workloads studied, GPU-LocalTM provides between 1.25-80X speedup over serialized critical sections, while the overhead introduced by transaction management is lower than 20%.

[1]  Andrew Brownsword,et al.  Hardware transactional memory for GPU architectures , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Antonia Zhai,et al.  Lightweight Software Transactions on GPUs , 2014, 2014 43rd International Conference on Parallel Processing.

[3]  Philippas Tsigas,et al.  Towards a Software Transactional Memory for Graphics Processors , 2010, EGPGV@Eurographics.

[4]  Tor M. Aamodt,et al.  Energy efficient GPU transactional memory via space-time optimizations , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[6]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Depei Qian,et al.  Software Transactional Memory for GPU Architectures , 2014, IEEE Computer Architecture Letters.