Complementing user-level coarse-grain parallelism with implicit speculative parallelism
暂无分享,去创建一个
[1] James Tschanz,et al. Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).
[2] Meeta Sharma Gupta,et al. System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[3] Gurindar S. Sohi,et al. Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[4] Kunle Olukotun,et al. A Scalable, Non-blocking Approach to Transactional Memory , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[5] Kunle Olukotun,et al. An effective hybrid transactional memory system with strong isolation guarantees , 2007, ISCA '07.
[6] James E. Smith,et al. Managing multi-configuration hardware via dynamic working set analysis , 2002, ISCA.
[7] Margaret Martonosi,et al. Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[8] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[9] Marcelo Cintra,et al. Handling branches in TLS systems with Multi-Path Execution , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[10] Feng Liu,et al. Scalable Speculative Parallelization on Commodity Clusters , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[11] Chen Ding,et al. Locality phase prediction , 2004, ASPLOS XI.
[12] Anoop Gupta,et al. The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.
[13] Milind Girkar,et al. On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings , 2006, ICS '06.
[14] Rudolf Eigenmann,et al. Min-cut program decomposition for thread-level speculation , 2004, PLDI '04.
[15] David A. Wood,et al. Supporting nested transactional memory in logTM , 2006, ASPLOS XII.
[16] Sandhya Dwarkadas,et al. Characterizing and predicting program behavior and its variability , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[17] Mitsuhisa Sato,et al. Performance Evaluation of the Omni OpenMP Compiler , 2000, ISHPC.
[18] Michael L. Scott,et al. Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor , 2003, ISCA '03.
[19] Bronis R. de Supinski,et al. Adagio: making DVS practical for complex HPC applications , 2009, ICS.
[20] Todd C. Mowry,et al. The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[21] Wei Liu,et al. POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.
[22] Antonia Zhai,et al. Compiler optimization of scalar value communication between speculative threads , 2002, ASPLOS X.
[23] Margaret Martonosi,et al. Multipath execution: opportunities and limits , 1998, ICS '98.
[24] Margaret Martonosi,et al. Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[25] Dean M. Tullsen,et al. Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[26] Michael C. Huang,et al. Positional adaptation of processors: application to energy reduction , 2003, ISCA '03.
[27] Nikolas Ioannou,et al. Increasing the energy efficiency of TLS systems using intermediate checkpointing , 2011, 2011 18th International Conference on High Performance Computing.
[28] Josep Torrellas,et al. Speculative synchronization: applying thread-level speculation to explicitly parallel applications , 2002, ASPLOS X.
[29] D.K. Lowenthal,et al. Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[30] Pedro López,et al. Anaphase: A Fine-Grain Thread Decomposition Scheme for Speculative Multithreading , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[31] José González,et al. Meeting points: Using thread criticality to adapt multicore hardware to parallel regions , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[32] Tajana Simunic,et al. Dynamic voltage frequency scaling for multi-tasking systems using online learning , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).
[33] Gu-Yeon Wei,et al. Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.
[34] Timothy Mattson,et al. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[35] Nikolas Ioannou,et al. Combining thread level speculation helper threads and runahead execution , 2009, ICS.
[36] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[37] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.
[38] Dimitrios S. Nikolopoulos,et al. Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.
[39] John L. Gustafson,et al. Reevaluating Amdahl's law , 1988, CACM.
[40] Margaret Martonosi,et al. An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[41] Yun Zhang,et al. Decoupled software pipelining creates parallelization opportunities , 2010, CGO '10.
[42] Josep Torrellas,et al. Tradeoffs in buffering memory state for thread-level speculation in multiprocessors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[43] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[44] Gurindar S. Sohi,et al. ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.
[45] Josep Torrellas,et al. Architectural support for scalable speculative parallelization in shared-memory multiprocessors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[46] Allan Gottlieb,et al. Highly parallel computing , 1989, Benjamin/Cummings Series in computer science and engineering.
[47] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[48] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[49] Easwaran Raman,et al. Spice: speculative parallel iteration chunk execution , 2008, CGO '08.
[50] Bronis R. de Supinski,et al. Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[51] Pedro López,et al. Boosting single-thread performance in multi-core systems through fine-grain multi-threading , 2009, ISCA '09.
[52] Josep Torrellas,et al. Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[53] Dean M. Tullsen,et al. Mapping Out a Path from Hardware Transactional Memory to Speculative Multithreading , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[54] Mahmut T. Kandemir,et al. Exploiting barriers to optimize power consumption of CMPs , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[55] David A. Wood,et al. LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[56] S. Borkar,et al. An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.
[57] James R. Goodman,et al. Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.
[58] David K. Lowenthal,et al. Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.
[59] Milo M. K. Martin,et al. Deconstructing Transactional Semantics: The Subtleties of Atomicity , 2005 .
[60] Wei Liu,et al. Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation , 2005, ICS '05.
[61] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[62] Vivek Sarkar,et al. Partitioning parallel programs for macro-dataflow , 1986, LFP '86.
[63] Ryan N. Rakvic,et al. The Fuzzy Correlation between Code and Performance Predictability , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[64] Rudolf Eigenmann,et al. Cetus: A Source-to-Source Compiler Infrastructure for Multicores , 2009, Computer.
[65] Marc Tremblay,et al. A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC® Processor , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.
[66] Michael L. Scott,et al. Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[67] Engin Ipek,et al. Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.
[68] Michael C. Huang,et al. The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[69] Saurabh Dighe,et al. Within-die variation-aware dynamic-voltage-frequency scaling core mapping and thread hopping for an 80-core processor , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[70] Steven M. Nowick,et al. Robust interfaces for mixed-timing systems with application to latency-insensitive protocols , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[71] Luis Ceze,et al. Implicit parallelism with ordered transactions , 2007, PPoPP.
[72] Michael C. Huang,et al. Dynamically Tuning Processor Resources with Adaptive Processing , 2003, Computer.
[73] Kunle Olukotun,et al. The Stanford Hydra CMP , 2000, IEEE Micro.
[74] J. Gregory Steffan,et al. Improving cache locality for thread-level speculation , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[75] Kunle Olukotun,et al. Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.
[76] Marc Tremblay,et al. Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor , 2009, ISCA '09.
[77] Yale N. Patt,et al. Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.
[78] Gurindar S. Sohi,et al. Task selection for a multiscalar processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[79] Stijn Eyerman,et al. Modeling critical sections in Amdahl's law and its implications for multicore design , 2010, ISCA '10.
[80] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[81] Brad Calder,et al. Phase tracking and prediction , 2003, ISCA '03.
[82] Gavin Brown,et al. Toward a more accurate understanding of the limits of the TLS execution paradigm , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[83] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[84] Onur Mutlu,et al. Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.
[85] Antonia Zhai,et al. A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[86] Anastasia Ailamaki,et al. Tolerating Dependences Between Large Speculative Threads Via Sub-Threads , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[87] Margaret Martonosi,et al. Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors , 2009, ISCA '09.
[88] Antonio González,et al. Clustered speculative multithreaded processors , 1999, ICS '99.
[89] V. Rich. Personal communication , 1989, Nature.
[90] Dean M. Tullsen,et al. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices , 2005, PLDI '05.
[91] Antonia Zhai,et al. Dynamic performance tuning for speculative threads , 2009, ISCA '09.
[92] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[93] Babak Falsafi,et al. Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor , 2001, ICS '01.
[94] Antonia Zhai,et al. The STAMPede approach to thread-level speculation , 2005, TOCS.
[95] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[96] David K. Lowenthal,et al. Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster , 2006, PPoPP '06.
[97] Sally A. McKee,et al. Understanding PARSEC performance on contemporary CMPs , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[98] MartinezAlejandro,et al. Boosting single-thread performance in multi-core systems through fine-grain multi-threading , 2009 .
[99] Josep Torrellas,et al. Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor , 1998, ICS '98.
[100] James R. Goodman,et al. Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.
[101] Margaret Martonosi,et al. Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[102] Manoj Franklin,et al. A general compiler framework for speculative multithreading , 2002, SPAA '02.
[103] Jose Renau,et al. Energy-Efficient Thread-Level Speculation , 2006, IEEE Micro.
[104] Shekhar Y. Borkar,et al. Thousand Core ChipsA Technology Perspective , 2007, 2007 44th ACM/IEEE Design Automation Conference.
[105] Edward A. Lee. The problem with threads , 2006, Computer.
[106] Bradford Nichols,et al. Pthreads programming , 1996 .
[107] Murali Annavaram,et al. Mitigating Amdahl's Law through EPI Throttling , 2005, ISCA 2005.