Evaluating the Cost of Atomic Operations on Modern Architectures
暂无分享,去创建一个
[1] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[2] Maurice Herlihy,et al. The Art of Multiprocessor Programming, Revised Reprint , 2012 .
[3] Guang R. Gao,et al. Toward high-throughput algorithms on many-core architectures , 2012, TACO.
[4] Courtenay T Vaughan. Application Characteristics and Performance on a Cray XE6. , 2011 .
[5] Ulrich Brüning,et al. A versatile, low latency HyperTransport core , 2007, FPGA '07.
[6] Robert Schöne,et al. Main memory and cache performance of intel sandy bridge and AMD bulldozer , 2014, MSPC@PLDI.
[7] Torsten Hoefler,et al. Fault tolerance for remote memory access programming models , 2014, HPDC '14.
[8] Christos Faloutsos,et al. Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..
[9] Wolfgang E. Nagel,et al. Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[10] Cheng Chen,et al. A practical nonblocking queue algorithm using compare-and-swap , 2000, Proceedings Seventh International Conference on Parallel and Distributed Systems (Cat. No.PR00568).
[11] George Coulouris,et al. Distributed systems - concepts and design , 1988 .
[12] Carl Staelin,et al. Memory hierarchy performance measurement of commercial dual-core desktop processors , 2008, J. Syst. Archit..
[13] Petr Tuma,et al. Investigating Cache Parameters of x86 Family Processors , 2009, SPEC Benchmark Workshop.
[14] Matthias Hauswirth,et al. Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.
[15] Hsien-Hsin S. Lee,et al. Supporting cache coherence in heterogeneous multiprocessor systems , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.
[16] Christopher J. Hughes,et al. Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[17] Matthias S. Müller,et al. Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[18] John S. Keen,et al. Measuring Memory Hierarchy Performance of Cache-Coherent Multiprocessors Using Micro Benchmarks , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[19] Brian W. Barrett,et al. Introducing the Graph 500 , 2010 .
[20] No License,et al. Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .
[21] Torsten Hoefler,et al. Enabling highly-scalable remote memory access programming with MPI-3 One Sided , 2014, Sci. Program..
[22] Torsten Hoefler,et al. Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations , 2015, ICS.
[23] Yehuda Afek,et al. Fast concurrent queues for x86 processors , 2013, PPoPP '13.
[24] John Shalf,et al. Programming Abstractions for Data Locality , 2014 .
[25] Nir Shavit,et al. A Hierarchical CLH Queue Lock , 2006, Euro-Par.
[26] Sabela Ramos,et al. Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi , 2013, HPDC.
[27] Tudor David,et al. Everything you always wanted to know about synchronization but were afraid to ask , 2013, SOSP.
[28] Canqun Yang,et al. MilkyWay-2 supercomputer: system and application , 2014, Frontiers of Computer Science.
[29] Torsten Hoefler,et al. Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages , 2015, HPDC.
[30] Timothy L. Harris,et al. A Pragmatic Implementation of Non-blocking Linked-Lists , 2001, DISC.
[31] Wu-chun Feng,et al. Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures , 2015, ICPE.
[32] Maged M. Michael,et al. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.
[33] Allan Gottlieb,et al. Operating system data structures for shared memory mimd machines with fetch-and-add , 1988 .