Pre-execution via speculative data-driven multithreading
暂无分享,去创建一个
[1] Shlomit S. Pinter,et al. Tango: a hardware-based data prefetching technique for superscalar processors , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[2] Trevor Mudge,et al. Improving data cache performance by pre-executing instructions under a cache miss , 1997 .
[3] Augustus K. Uht,et al. Disjoint eager execution: an optimal form of speculative execution , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[4] Trevor N. Mudge,et al. The YAGS branch prediction scheme , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[5] Eric Rotenberg,et al. A study of slipstream processors , 2000, MICRO 33.
[6] James E. Smith,et al. A study of branch prediction strategies , 1981, ISCA '98.
[7] Jignesh M. Patel,et al. Data prefetching by dependence graph precomputation , 2001, ISCA 2001.
[8] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[9] Gurindar S. Sohi,et al. Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[10] John Paul Shen,et al. Instruction path coprocessors , 2000, ISCA '00.
[11] Mario Nemirovsky,et al. Increasing superscalar performance through multistreaming , 1995, PACT.
[12] Mateo Valero,et al. Out-of-order vector architectures , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[13] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[14] Mikko H. Lipasti,et al. Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[15] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[16] Eric Rotenberg,et al. Slipstream processors: improving both performance and fault tolerance , 2000, SIGP.
[17] Andreas Moshovos,et al. Improving virtual function call target prediction via dependence-based pre-computation , 1999, ICS '99.
[18] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[19] Rajiv Gupta,et al. Predictability of load/store instruction latencies , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.
[20] Kevin B. Theobald,et al. On the limits of program parallelism and its smoothability , 1992, MICRO 1992.
[21] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[22] Gary S. Tyson,et al. Improving the accuracy and performance of memory communication through renaming , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[23] Yale N. Patt,et al. Target prediction for indirect jumps , 1997, ISCA '97.
[24] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.
[25] Karel Driesen,et al. The cascaded predictor: economical and adaptive branch target prediction , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[26] Harry Dwyer,et al. An out-of-order superscalar processor with speculative execution and fast, precise interrupts , 1992, MICRO 1992.
[27] Richard P. Hopkins,et al. Data-Driven and Demand-Driven Computer Architecture , 1982, CSUR.
[28] Karel Driesen,et al. Accurate indirect branch prediction , 1998, ISCA.
[29] Stéphan Jourdan,et al. A novel renaming scheme to exploit value temporal locality through physical register reuse and unification , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[30] Alan Eustace,et al. ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.
[31] Todd M. Austin,et al. DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[32] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[33] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[34] Masato Edahiro,et al. A Single-Chip Multiprocessor for Smart Terminals , 2000, IEEE Micro.
[35] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.
[36] James R. Larus,et al. Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.
[37] Anoop Gupta,et al. Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.
[38] E. Smith,et al. Selective Dual Path Execution , 1996 .
[39] Mikko H. Lipasti,et al. Cache miss heuristics and preloading techniques for general-purpose programs , 1995, MICRO 28.
[40] G.S. Sohi,et al. Dynamic instruction reuse , 1997, ISCA '97.
[41] Uri C. Weiser,et al. Correlated load-address predictors , 1999, ISCA.
[42] Hwa C. Torng,et al. The Concurrent Execution of Multiple Instruction Streams on Superscalar Processors , 1991, ICPP.
[43] C. Zilles,et al. Understanding the backward slices of performance degrading instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[44] Shai Rubin,et al. Focusing processor policies via critical-path prediction , 2001, ISCA 2001.
[45] Andreas Moshovos,et al. Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.
[46] M. Martonosi,et al. Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[47] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.
[48] Andreas Moshovos,et al. Memory dependence speculation tradeoffs in centralized, continuous-window superscalar processors , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[49] Andrew R. Pleszkun,et al. Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.
[50] Joseph T. Rahmeh,et al. Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.
[51] Manoj Franklin,et al. The multiscalar architecture , 1993 .
[52] Anne Rogers,et al. Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.
[53] Rajeev Balasubramonian,et al. Dynamically allocating processor resources between nearby and distant ILP , 2001, ISCA 2001.
[54] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.
[55] Dirk Grunwald,et al. Selective eager execution on the PolyPath architecture , 1998, ISCA.
[56] Jeffrey Dean,et al. ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[57] Mikko H. Lipasti,et al. Value locality and load value prediction , 1996, ASPLOS VII.
[58] Todd C. Mowry,et al. The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[59] Kevin O'Brien,et al. Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading , 1995, PACT.
[60] Yale N. Patt,et al. Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.
[61] Mikko H. Lipasti. Value locality and speculative execution , 1998 .
[62] John Paul Shen,et al. PipeRench implementation of the instruction path coprocessor , 2000, MICRO 33.
[63] Craig Zilles,et al. Execution-based prediction using speculative slices , 2001, ISCA 2001.
[64] Jack L. Lo,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[65] Luddy Harrison. Examination of a memory access classification scheme for pointer-intensive and numeric programs , 1996, ICS '96.
[66] Brad Calder,et al. Threaded multiple path execution , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[67] James E. Smith,et al. The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[68] J.E. Smith,et al. Achieving high performance via co-designed virtual machines , 1998, Innovative Architecture for Future Generation High-Performance Processors and Systems.
[69] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.