Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism
暂无分享,去创建一个
Daniel Sánchez | Joel S. Emer | Suvinay Subramanian | Victor A. Ying | Mark C. Jeffrey | Hyun Ryong Lee | J. Emer | Daniel Sánchez | M. C. Jeffrey | Suvinay Subramanian
[1] J. Eliot B. Moss. Open Nested Transactions: Semantics and Support , 2006 .
[2] Andrey Brito,et al. Speculative out-of-order event processing with software transaction memory , 2008, DEBS.
[3] Victor Pankratius,et al. A study of transactional memory vs. locks in practice , 2011, SPAA '11.
[4] Wei Liu,et al. Thread-Level Speculation on a CMP can be energy efficient , 2005, ICS '05.
[5] Antonia Zhai,et al. A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[6] Keshav Pingali,et al. Exploiting the commutativity lattice , 2011, PLDI '11.
[7] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.
[8] Easwaran Raman,et al. Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[9] Josep Torrellas,et al. OmniOrder: Directory-based conflict serialization of transactions , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[10] Keshav Pingali,et al. Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms , 2011, PPoPP '11.
[11] Hiroshi Nakashima,et al. A mechanism for speculative memory accesses following synchronizing operations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[12] Charles E. Leiserson,et al. Ordering heuristics for parallel graph coloring , 2014, SPAA.
[13] Maurice Herlihy,et al. Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[14] Dan Grossman,et al. Lock Prediction , .
[15] James R. Goodman,et al. Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.
[16] Tarek S. Abdelrahman,et al. Hardware Support for Relaxed Concurrency Control in Transactional Memory , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[17] Martin C. Rinard,et al. Commutativity analysis: a new analysis technique for parallelizing compilers , 1997, TOPL.
[18] James R. Goodman,et al. Efficient Synchronization: Let Them Eat QOLB , 1997, International Symposium on Computer Architecture.
[19] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[20] Christoforos E. Kozyrakis,et al. Flexible architectural support for fine-grain scheduling , 2010, ASPLOS XV.
[21] James Bennett,et al. The Netflix Prize , 2007 .
[22] Mark Moir,et al. Simplifying concurrent algorithms by exploiting hardware transactional memory , 2010, SPAA '10.
[23] Michael L. Scott,et al. Sandboxing transactional memory , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[24] Larry Carter,et al. Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.
[25] Guy E. Blelloch,et al. Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing , 2017, SPAA.
[26] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.
[27] David A. Wood,et al. LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[28] Eddie Kohler,et al. Speedy transactions in multicore in-memory databases , 2013, SOSP.
[29] Cong Yan,et al. A scalable architecture for ordered parallelism , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[30] Rachid Guerraoui,et al. On the correctness of transactional memory , 2008, PPoPP.
[31] Daniel Sánchez,et al. SAM: Optimizing Multithreaded Cores for Speculative Parallelism , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[32] Krste Asanovic,et al. Controlling program execution through binary instrumentation , 2005, CARN.
[33] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[34] Josep Torrellas,et al. Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors , 2005, TACO.
[35] David A. Wood,et al. Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.
[36] Emmett Witchel,et al. Is transactional programming actually easier? , 2010, PPoPP '10.
[37] Kunle Olukotun,et al. Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.
[38] Kunle Olukotun,et al. Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[39] Hansen Zhang,et al. Hardware Multithreaded Transactions , 2018, ASPLOS.
[40] F. Maxwell Harper,et al. The MovieLens Datasets: History and Context , 2016, TIIS.
[41] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[42] David A. Wood,et al. LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[43] Josep Torrellas,et al. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[44] David R. Jefferson,et al. Virtual time , 1985, ICPP.
[45] Matei Zaharia,et al. Making caches work for graph analytics , 2016, 2017 IEEE International Conference on Big Data (Big Data).
[46] David A. Wood,et al. Supporting nested transactional memory in logTM , 2006, ASPLOS XII.
[47] Todd C. Mowry,et al. The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[48] Luke Dalessandro Michael,et al. Strong Isolation is a Weak Idea , 2009 .
[49] Bradley C. Kuszmaul. SuperMalloc: a super fast multithreaded malloc for 64-bit machines , 2015, ISMM.
[50] Guy E. Blelloch,et al. Brief announcement: the problem based benchmark suite , 2012, SPAA '12.
[51] Emmett Witchel,et al. Dependence-aware transactional memory for increased concurrency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[52] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[53] T. N. Vijaykumar,et al. Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies , 2013, ASPLOS '13.
[54] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[55] Kunle Olukotun,et al. STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.
[56] Joel Emer,et al. Unlocking Ordered Parallelism with the Swarm Architecture , 2016, IEEE Micro.
[57] Christopher J. Hughes,et al. Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[58] Constantine D. Polychronopoulos,et al. Fast barrier synchronization hardware , 1990, Proceedings SUPERCOMPUTING '90.
[59] Guang R. Gao,et al. Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures , 2007, ISCA '07.
[60] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.
[61] Josep Torrellas,et al. Speculative synchronization: applying thread-level speculation to explicitly parallel applications , 2002, ASPLOS X.
[62] Eduard Ayguadé,et al. Task Superscalar: An Out-of-Order Task Pipeline , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[63] A. McDonald,et al. Architectural Semantics for Practical Transactional Memory , 2006, ISCA 2006.
[64] Donald E. Porter,et al. TxLinux: using and managing hardware transactional memory in an operating system , 2007, SOSP.
[65] Maged M. Michael,et al. Robust architectural support for transactional memory in the power architecture , 2013, ISCA.
[66] Charles E. Leiserson,et al. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers) , 2010, SPAA '10.
[67] Milo M. K. Martin,et al. Subtleties of transactional memory atomicity semantics , 2006, IEEE Computer Architecture Letters.
[68] Corporate Unix Press. System V application binary interface (3rd ed.) , 1993 .
[69] Arturo González-Escribano,et al. A Survey on Thread-Level Speculation Techniques , 2016, ACM Comput. Surv..
[70] Ulrich Meyer,et al. [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.
[71] Simon L. Peyton Jones,et al. Composable memory transactions , 2005, CACM.
[72] Josep Torrellas,et al. Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[73] Emilio L. Zapata,et al. Effective Transactional Memory Execution Management for Improved Concurrency , 2014, ACM Trans. Archit. Code Optim..
[74] Dean M. Tullsen,et al. Mapping Out a Path from Hardware Transactional Memory to Speculative Multithreading , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[75] Adam Silberstein,et al. Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.
[76] Daniel Sánchez,et al. Fractal: An execution model for fine-grain nested speculative parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[77] J. P. Grossman,et al. Hardware support for fine-grained event-driven computation in Anton 2 , 2013, ASPLOS '13.
[78] Michael M. Swift,et al. Pathological Interaction of Locks with Transactional Memory , 2008 .
[79] Craig Zilles,et al. Extending Hardware Transactional Memory to Support Non-busy Waiting and Non-transactional Actions , 2006 .
[80] William J. Dally,et al. The J-machine Multicomputer: An Architectural Evaluation , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[81] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[82] Eddie Kohler,et al. The scalable commutativity rule , 2017, Commun. ACM.
[83] Antonia Zhai,et al. Compiler optimization of memory-resident value communication between speculative threads , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[84] Craig B. Zilles,et al. An Analysis of I/O And Syscalls In Critical Sections And Their Implications For Transactional Memory , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.
[85] Daniel Sánchez,et al. Data-centric execution of speculative parallel programs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[86] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[87] Michael M. Swift,et al. Condition Variables and Transactional Memory : Problem or Opportunity ? , 2009 .
[88] Cody Cutler,et al. Phase Reconciliation for Contended In-Memory Transactions , 2014, OSDI.
[89] Mateo Valero,et al. Architectural Support for Task Dependence Management with Flexible Software Scheduling , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[90] William J. Dally,et al. Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[91] Wei Liu,et al. Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation , 2005, ICS '05.
[92] Ali-Reza Adl-Tabatabai,et al. McRT-Malloc: a scalable transactional memory allocator , 2006, ISMM '06.
[93] Milo M. K. Martin,et al. Making the fast case common and the uncommon case simple in unbounded transactional memory , 2007, ISCA '07.
[94] Benoît Dupont de Dinechin,et al. A clustered manycore processor architecture for embedded and accelerated applications , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).