SAM: Optimizing Multithreaded Cores for Speculative Parallelism
暂无分享,去创建一个
Daniel Sánchez | Joel S. Emer | Suvinay Subramanian | Maleen Abeydeera | Mark C. Jeffrey | J. Emer | Daniel Sánchez | M. C. Jeffrey | Maleen Abeydeera | Suvinay Subramanian
[1] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[2] Josep Torrellas,et al. Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors , 2005, TACO.
[3] Josep Torrellas,et al. BulkSMT: Designing SMT processors for atomic-block execution , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[4] David A. Wood,et al. Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.
[5] Brad Calder,et al. Threaded multiple path execution , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[6] Kunle Olukotun,et al. Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.
[7] Timothy J. Slegel,et al. Transactional Memory Architecture and Implementation for IBM System Z , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[8] Larry Carter,et al. Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.
[9] Kunle Olukotun,et al. STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.
[10] Lawrence Rauchwerger,et al. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.
[11] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[12] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[13] M TullsenDean,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .
[14] Burton J. Smith,et al. The architecture of HEP , 1985 .
[15] Ronald G. Dreslinski,et al. Proactive transaction scheduling for contention management , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Maged M. Michael,et al. Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[17] Antonia Zhai,et al. Efficiency of thread-level speculation in SMT and CMP architectures - performance, power and thermal perspective , 2008, 2008 IEEE International Conference on Computer Design.
[18] Todd C. Mowry,et al. The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[19] Josep Torrellas,et al. Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[20] Christoforos E. Kozyrakis,et al. ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.
[21] T. N. Vijaykumar,et al. Implicitly-multithreaded processors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[22] Ronald G. Dreslinski,et al. Bloom Filter Guided Transaction Scheduling , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[23] Tor M. Aamodt,et al. Energy efficient GPU transactional memory via space-time optimizations , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[24] Robert Golla,et al. T4: A highly threaded server-on-a-chip with native support for heterogeneous computing , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[25] Yen-Chen Liu,et al. Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.
[26] Peter S. Pacheco. Parallel programming with MPI , 1996 .
[27] Mahmut T. Kandemir,et al. OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance , 2013, ASPLOS '13.
[28] José González,et al. Meeting points: Using thread criticality to adapt multicore hardware to parallel regions , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[29] Steven K. Reinhardt,et al. The impact of resource partitioning on SMT processors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[30] William N. Scherer,et al. Advanced contention management for dynamic software transactional memory , 2005, PODC '05.
[31] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[32] Maurice Herlihy,et al. Virtualizing transactional memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[33] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[34] Andrew Brownsword,et al. Hardware transactional memory for GPU architectures , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[35] Yunheung Paek,et al. Parallel Programming with Polaris , 1996, Computer.
[36] Josep Torrellas,et al. BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.
[37] Nir Shavit,et al. Software transactional memory , 1995, PODC '95.
[38] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[39] Joel Emer,et al. Unlocking Ordered Parallelism with the Swarm Architecture , 2016, IEEE Micro.
[40] Donald S. Fussell,et al. Priority-based cache allocation in throughput processors , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[41] Mike Houston,et al. GPUs: A Closer Look , 2008, ACM Queue.
[42] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[43] Onur Mutlu,et al. Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.
[44] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[45] Daniel Sánchez,et al. Fractal: An execution model for fine-grain nested speculative parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[46] Francisco J. Cazorla,et al. A dynamic scheduler for balancing HPC applications , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[47] Hsien-Hsin S. Lee,et al. Adaptive transaction scheduling for transactional memory systems , 2008, SPAA '08.
[48] Brandon Lucia,et al. DMP: Deterministic Shared-Memory Multiprocessing , 2010, IEEE Micro.
[49] Scott A. Mahlke,et al. Mascar: Speeding up GPU warps by reducing memory pitstops , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[50] Eddie Kohler,et al. Speedy transactions in multicore in-memory databases , 2013, SOSP.
[51] Cong Yan,et al. A scalable architecture for ordered parallelism , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[52] Francisco J. Cazorla,et al. Software-Controlled Priority Characterization of POWER5 Processor , 2008, 2008 International Symposium on Computer Architecture.
[53] Daniel Sánchez,et al. Data-centric execution of speculative parallel programs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[54] Emmett Witchel,et al. Dependence-aware transactional memory for increased concurrency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[55] T. N. Vijaykumar,et al. Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies , 2013, ASPLOS '13.
[56] Wei Liu,et al. Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation , 2005, ICS '05.
[57] Onur Mutlu,et al. Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[58] Christopher J. Hughes,et al. Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[59] T. N. Vijaykumar,et al. Timetraveler: exploiting acyclic races for optimizing memory race recording , 2010, ISCA.
[60] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[61] David A. Wood,et al. LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[62] Josep Torrellas,et al. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[63] David R. Jefferson,et al. Virtual time , 1985, ICPP.
[64] Onur Mutlu,et al. Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.
[65] Kunle Olukotun,et al. Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[66] Maged M. Michael,et al. Quantitative comparison of Hardware Transactional Memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8 , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[67] Dean M. Tullsen,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.
[68] Tarek S. Abdelrahman,et al. Hardware Support for Relaxed Concurrency Control in Transactional Memory , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[69] Wei Liu,et al. Thread-Level Speculation on a CMP can be energy efficient , 2005, ICS '05.
[70] Antonia Zhai,et al. A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[71] Keshav Pingali,et al. Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms , 2011, PPoPP '11.
[72] Jack L. Lo,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[73] Krste Asanovic,et al. Controlling program execution through binary instrumentation , 2005, CARN.
[74] Maurice Herlihy,et al. Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[75] D. J. A. Welsh,et al. An upper bound for the chromatic number of a graph and its application to timetabling problems , 1967, Comput. J..
[76] Dean M. Tullsen,et al. Supporting fine-grained synchronization on a simultaneous multithreading processor , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[77] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[78] Maged M. Michael,et al. Robust architectural support for transactional memory in the power architecture , 2013, ISCA.
[79] Easwaran Raman,et al. Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[80] James R. Goodman,et al. Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.
[81] Robert Morris,et al. Non-scalable locks are dangerous , 2012 .
[82] Guy E. Blelloch,et al. Internally deterministic parallel algorithms can be fast , 2012, PPoPP '12.
[83] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .
[84] Kunle Olukotun,et al. Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[85] Arturo González-Escribano,et al. A Survey on Thread-Level Speculation Techniques , 2016, ACM Comput. Surv..
[86] Josep Torrellas,et al. OmniOrder: Directory-based conflict serialization of transactions , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[87] Mikel Luján,et al. Steal-on-Abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering , 2008, HiPEAC.
[88] Charles E. Leiserson,et al. Ordering heuristics for parallel graph coloring , 2014, SPAA.
[89] David A. Wood,et al. LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.