Chasing Away RAts: Semantics and evaluation for relaxed atomics on heterogeneous systems
暂无分享,去创建一个
[1] Hans-Juergen Boehm,et al. Outlawing ghosts: avoiding out-of-thin-air results , 2014, MSPC@PLDI.
[2] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[3] Michel Dubois,et al. Memory access buffering in multiprocessors , 1998, ISCA '98.
[4] Tor M. Aamodt,et al. Energy efficient GPU transactional memory via space-time optimizations , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] David Lang,et al. N4215: Towards Implementation and Use of memory order consume , 2014 .
[6] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[7] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[8] Scott Owens,et al. Benchmarking weak memory models , 2016, PPOPP.
[9] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[10] Sarita V. Adve,et al. DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations , 2015, ASPLOS.
[11] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[12] Mark D. Hill,et al. A Unified Formalization of Four Shared-Memory Models , 1993, IEEE Trans. Parallel Distributed Syst..
[13] Hans-Juergen Boehm. Can seqlocks get along with programming language memory models? , 2012, MSPC '12.
[14] Ori Lahav,et al. Taming release-acquire consistency , 2016, POPL.
[15] Jeremy Manson,et al. The Java memory model , 2005, POPL '05.
[16] Viktor Vafeiadis,et al. Relaxed separation logic: a program logic for C11 concurrency , 2013, OOPSLA.
[17] P. McKenney. Some Examples of Kernel-Hacker Informal Correctness Reasoning , 2015 .
[18] John D. Owens,et al. Efficient Synchronization Primitives for GPUs , 2011, ArXiv.
[19] M. Hill,et al. Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[20] Daniel Sánchez,et al. Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[21] John Wickerson,et al. Overhauling SC atomics in C11 and OpenCL , 2016, POPL.
[22] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[23] Peter Sewell,et al. The Problem of Programming Language Concurrency Semantics , 2015, ESOP.
[24] Sarita V. Adve,et al. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[25] David A. Wood,et al. Lazy release consistency for GPUs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[26] Jeehoon Kang,et al. A promising semantics for relaxed-memory concurrency , 2017, POPL.
[27] Alastair F. Donaldson,et al. Exposing errors related to weak memory in GPU applications , 2016, PLDI.
[28] Sarita V. Adve,et al. Designing memory consistency models for shared-memory multiprocessors , 1993 .
[29] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[30] Matthew D. Sinclair,et al. Porting CMP Benchmarks to GPUs , 2011 .
[31] Kevin Skadron,et al. Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).
[32] Ganesh Gopalakrishnan,et al. Towards shared memory consistency models for GPUs , 2013, ICS '13.
[33] Sarita V. Adve,et al. Stash: Have your scratchpad and cache it too , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[34] Derek Hower,et al. HRF-Relaxed: Adapting HRF to the Complexities of Industrial Heterogeneous Memory Models , 2015, TACO.
[35] David A. Wood,et al. Heterogeneous-race-free memory models , 2014, ASPLOS.
[36] Stephen L. Olivier,et al. UTS: An Unbalanced Tree Search Benchmark , 2006, LCPC.
[37] Peter Sewell,et al. A concurrency semantics for relaxed atomics that permits optimisation and avoids thin-air executions , 2016, POPL.
[38] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[39] Vincent Gramoli,et al. More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms , 2015, PPoPP.
[40] Anoop Gupta,et al. Programming for Different Memory Consistency Models , 1992, J. Parallel Distributed Comput..
[41] D. K. Arvind,et al. Languages and Compilers for Parallel Computing , 2014, Lecture Notes in Computer Science.
[42] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.
[43] Sarita V. Adve,et al. DeNovoND: efficient hardware support for disciplined non-determinism , 2013, ASPLOS '13.
[44] Sarita V. Adve,et al. Efficient GPU synchronization without scopes: Saying no to complex consistency models , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[45] Hans-Juergen Boehm,et al. Foundations of the C++ concurrency memory model , 2008, PLDI '08.
[46] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[47] Sarita V. Adve,et al. Memory models: a case for rethinking parallel languages and hardware , 2009, PODC '09.
[48] James R. Goodman,et al. Cache Consistency and Sequential Consistency , 1991 .
[49] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[50] David A. Wood,et al. GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors , 2015, 2015 IEEE International Symposium on Workload Characterization.
[51] Tyler Sorensen,et al. ICS: U: Towards Shared Memory Consistency Models for GPUs , 2014 .
[52] Jonathan Walpole,et al. User-Level Implementations of Read-Copy Update , 2012, IEEE Transactions on Parallel and Distributed Systems.