论文信息 - Efficient Inspected Critical Sections in Data-Parallel GPU Codes

Efficient Inspected Critical Sections in Data-Parallel GPU Codes

Optimistic concurrency control and STMs rely on the assumption of sparse conflicts. For data-parallel GPU codes with many or with dynamic data dependences, a pessimistic and lock-based approach may be faster, if only GPUs would offer hardware support for GPU-wide fine-grained synchronization. Instead, current GPUs inflict dead- and livelocks on attempts to implement such synchronization in software.

Michael Philippsen | Ronald Veldema | Thorsten Blaß

[1] Depei Qian,et al. Lock-based synchronization for GPU architectures , 2016, Conf. Computing Frontiers.

[2] Henk Corporaal,et al. Fine-Grained Synchronizations and Dataflow Programming on GPUs , 2015, ICS.

[3] Maged M. Michael,et al. Software Transactional Memory: Why Is It Only a Research Toy? , 2008, ACM Queue.

[4] Joel H. Saltz,et al. Run-time parallelization and scheduling of loops , 1989, SPAA '89.

[5] Philippas Tsigas,et al. Towards a Software Transactional Memory for Graphics Processors , 2010, EGPGV@Eurographics.

[6] Joel H. Saltz,et al. Run-Time Parallelization and Scheduling of Loops , 1991, IEEE Trans. Computers.

[7] Arun Ramamurthy,et al. Towards scalar synchronization in SIMT architectures , 2011 .

[8] Graham Morgan,et al. PR-STM: Priority Rule Based Software Transactions for the GPU , 2015, Euro-Par.

[9] Depei Qian,et al. Software Transactional Memory for GPU Architectures , 2014, IEEE Computer Architecture Letters.

[10] Anant Agarwal,et al. Smartlocks: lock acquisition scheduling for self-aware synchronization , 2010, ICAC '10.

[11] David R. Kaeli,et al. HQL: A Scalable Synchronization Mechanism for GPUs , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[12] Paul Erdös,et al. On random graphs, I , 1959 .

[13] Antonia Zhai,et al. Lightweight Software Transactions on GPUs , 2014, 2014 43rd International Conference on Parallel Processing.

[14] Kunle Olukotun,et al. STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[15] Andrew Brownsword,et al. Synchronization via scheduling: techniques for efficiently managing shared state , 2011, PLDI '11.

[16] Alexander Knapp,et al. On the Correctness of the SIMT Execution Model of GPUs , 2012, ESOP.

[17] Lawrence Rauchwerger,et al. Speculative Parallelization of Partially Parallel Loops , 2000, LCR.

[18] Wu-chun Feng,et al. On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[19] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[20] Tor M. Aamodt,et al. MIMD synchronization on SIMT architectures , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).