Specifying and testing GPU workgroup progress models
暂无分享,去创建一个
Margaret Martonosi | John Wickerson | Alastair F. Donaldson | Tyler Sorensen | Hugues Evrard | Lucas F. Salvador | Harmit Raval | M. Martonosi | Tyler Sorensen | John Wickerson | A. Donaldson | Hugues Evrard | Harmit Raval
[1] Anton Podkopaev,et al. Making weak memory models fair , 2020, Proc. ACM Program. Lang..
[2] David A. Wood,et al. Independent Forward Progress of Work-groups , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[3] Alastair F. Donaldson,et al. Putting Randomized Compiler Testing into Production (Experience Report) , 2020, ECOOP.
[4] Xianwei Zhang,et al. Autonomous Data-Race-Free GPU Testing , 2019, 2019 IEEE International Symposium on Workload Characterization (IISWC).
[5] Alastair F. Donaldson,et al. One Size Doesn't Fit All: Quantifying Performance Portability of Graph Applications on GPUs , 2019, 2019 IEEE International Symposium on Workload Characterization (IISWC).
[6] D. Grimaldi. Amber , 2019, Current Biology.
[7] Roberto Palmieri,et al. Don't Forget About Synchronization!: A Case Study of K-Means on GPU , 2019, PMAM@PPoPP.
[8] Hyesoon Kim,et al. Translating CUDA to OpenCL for Hardware Generation using Neural Machine Translation , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[9] Alastair F. Donaldson,et al. GPU Schedulers: How Fair Is Fair Enough? , 2018, CONCUR.
[10] John Wickerson,et al. The semantics of transactions and weak memory in x86, Power, ARM, and C++ , 2017, PLDI.
[11] Alastair F. Donaldson,et al. Automated testing of graphics shader compilers , 2017, Proc. ACM Program. Lang..
[12] Daniel Lustig,et al. Automated Synthesis of Comprehensive Memory Model Litmus Test Suites , 2017, ASPLOS.
[13] George A. Constantinides,et al. Automatically comparing memory consistency models , 2017, POPL.
[14] Ganesh Gopalakrishnan,et al. Portable inter-workgroup barrier synchronisation for GPUs , 2016, OOPSLA.
[15] Tor M. Aamodt,et al. MIMD synchronization on SIMT architectures , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Alastair F. Donaldson,et al. Exposing errors related to weak memory in GPU applications , 2016, PLDI.
[17] Martin Burtscher,et al. Higher-order and tuple-based massively-parallel prefix sums , 2016, PLDI.
[18] Bruce Merry,et al. A Performance Comparison of Sort and Scan Libraries for GPUs , 2015, Parallel Process. Lett..
[19] Wen-mei W. Hwu,et al. Heterogeneous System Architecture: A New Compute Platform Infrastructure , 2015 .
[20] David A. Patterson,et al. The GAP Benchmark Suite , 2015, ArXiv.
[21] John Wickerson,et al. The Design and Implementation of a Verification Technique for GPU Kernels , 2015, TOPL.
[22] Ganesh Gopalakrishnan,et al. GPU Concurrency: Weak Behaviours and Programming Assumptions , 2015, ASPLOS.
[23] John D. Owens,et al. Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.
[24] Alastair F. Donaldson,et al. Interleaving and Lock-Step Semantics for Analysis and Verification of GPU Kernels , 2013, ESOP.
[25] Adam Betts,et al. GPUVerify: a verifier for GPU kernels , 2012, OOPSLA '12.
[26] Jeff A. Stuart,et al. A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).
[27] Alexander Knapp,et al. On the Correctness of the SIMT Execution Model of GPUs , 2012, ESOP.
[28] Peng Li,et al. GKLEE: concolic verification and test generation for GPUs , 2012, PPoPP '12.
[29] Radu Mateescu,et al. CADP 2011: a toolbox for the construction and analysis of distributed processes , 2012, International Journal on Software Tools for Technology Transfer.
[30] Wu-chun Feng,et al. CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-Core Architectures , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.
[31] Radu Mateescu,et al. A Study of Shared-Memory Mutual Exclusion Protocols Using CADP , 2010, FMICS.
[32] Anjul Patney,et al. Task management for irregular-parallel workloads on the GPU , 2010, HPG '10.
[33] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[34] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[35] Philippas Tsigas,et al. On dynamic load balancing on graphics processors , 2008, GH '08.
[36] Radu Mateescu,et al. A Model Checking Language for Concurrent Value-Passing Systems , 2008, FM.
[37] Christel Baier,et al. Principles of model checking , 2008 .
[38] Daniel Jackson,et al. Software Abstractions - Logic, Language, and Analysis , 2006 .
[39] Guy E. Blelloch,et al. Scans as Primitive Parallel Operations , 1989, ICPP.
[40] Dexter Kozen,et al. RESULTS ON THE PROPOSITIONAL’p-CALCULUS , 2001 .
[41] Saharon Shelah,et al. On the temporal analysis of fairness , 1980, POPL '80.
[42] Saharon Shelah,et al. On the Temporal Basis of Fairness. , 1980 .