Exploring, defining, and exploiting recent store value locality
暂无分享,去创建一个
[1] Michel Dubois,et al. Cache protocols with partial block invalidations , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.
[2] Paul I. Rubinfeld. Managing Problems at High Speed , 1998 .
[3] Mikko H. Lipasti,et al. Verifying sequential consistency using vector clocks , 2002, SPAA '02.
[4] Jun Yang,et al. Energy-efficient load and store reuse , 2001, ISLPED '01.
[5] Mikko H. Lipasti,et al. Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing , 2001, MICRO.
[6] José González,et al. The use of prediction for accelerating upgrade misses in cc-NUMA multiprocessors , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[7] Mikko H. Lipasti,et al. Value locality and load value prediction , 1996, ASPLOS VII.
[8] Eric Rotenberg,et al. AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[9] F. Gabbay. Speculative Execution based on Value Prediction Research Proposal towards the Degree of Doctor of Sciences , 1996 .
[10] R. Fox. Silence is golden. , 1998, Nursing standard (Royal College of Nursing (Great Britain) : 1987).
[11] David A. Wood,et al. A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.
[12] Ben J. Catanzaro,et al. Multiprocessor System Architectures , 1994 .
[13] Milo M. K. Martin,et al. Simulating a $ 2 M Commercial Server on a $ 2 K PC T , 2001 .
[14] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[15] Jim Nilsson,et al. Improving performance of load-store sequences for transaction processing workloads on multiprocessors , 1999, Proceedings of the 1999 International Conference on Parallel Processing.
[16] Mikko H. Lipasti,et al. Silent stores for free , 2000, MICRO 33.
[17] Mark D. Hill,et al. Multiprocessors Should Support Simple Memory-Consistency Models , 1998, Computer.
[18] Philip J. Woest,et al. The Wisconsin multicube: a new large-scale cache-coherent multiprocessor , 1988, ISCA '88.
[19] Gary Lauterbach,et al. UltraSPARC-III: designing third-generation 64-bit performance , 1999, IEEE Micro.
[20] T. May,et al. Alpha-particle-induced soft errors in dynamic memories , 1979, IEEE Transactions on Electron Devices.
[21] Margaret Martonosi,et al. Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.
[22] Joel S. Emer,et al. Loose loops sink chips , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[23] Ravi Rajwar,et al. Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[24] Antonio González,et al. Reducing Memory Traffic Via Redundant Store Instructions , 1999, HPCN Europe.
[25] Steven R. Kunkel,et al. System optimization for OLTP workloads , 1999, IEEE Micro.
[26] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[27] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[28] Mikko H. Lipasti,et al. Precise and Accurate Processor Simulation , 2002 .
[29] Alan Charlesworth,et al. Gigaplane-XB: Extending the Ultra Enterprise Family , 1997 .
[30] Luiz André Barroso,et al. Memory system characterization of commercial workloads , 1998, ISCA.
[31] Jim Nilsson,et al. Reducing ownership overhead for load-store sequences in cache-coherent multiprocessors , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[32] Mikko H. Lipasti,et al. Redeeming IPC as a performance metric for multithreaded programs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[33] Cathy May,et al. The PowerPC Architecture: A Specification for a New Family of RISC Processors , 1994 .
[34] Antonia Zhai,et al. Improving value communication for thread-level speculation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[35] Mikko H. Lipasti,et al. On the value locality of store instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[36] James F. Ziegler,et al. Terrestrial cosmic rays , 1996, IBM J. Res. Dev..
[37] Alan Jay Smith,et al. Aspects of cache memory and instruction buffer performance , 1987 .
[38] Mikko H. Lipasti,et al. Implementing optimizations at decode time , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[39] Sarita V. Adve,et al. RSIM: a simulator for shared-memory multiprocessor and uniprocessor systems that exploit ILP , 1997, WCAE-3 '97.
[40] Mikko H. Lipasti,et al. Characterization of silent stores , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).
[41] Norman P. Jouppi. Cache write policies and performance , 1993, ISCA '93.
[42] Jay C. Borkenhagen,et al. 5th generation 64-bit powerpc- compatible commercial processor design , 1999 .
[43] R. Blahut. Theory and practice of error control codes , 1983 .
[44] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[45] Michel Raynal,et al. Algorithms for mutual exclusion , 1986 .
[46] Hugh Garraway. Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.
[47] Håkan Grahn,et al. Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection , 1996, J. Parallel Distributed Comput..
[48] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[49] K. Gharachodoo,et al. Memory consistency models for shared memory multiprocessors , 1996 .
[50] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.
[51] Sarita V. Adve,et al. Performance of database workloads on shared-memory systems with out-of-order processors , 1998, ASPLOS VIII.
[52] Phillip B. Gibbons,et al. Testing Shared Memories , 1997, SIAM J. Comput..
[53] David Sinreich. Fault Tolerance Decision in DRAM Ap-plications , 1997 .
[54] David A. Wood,et al. Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[55] Mikko H. Lipasti,et al. Silent Stores and Store Value Locality , 2001, IEEE Trans. Computers.
[56] Andreas Moshovos,et al. Memory dependence prediction , 1998 .
[57] Eiji Fujiwara,et al. Error-control coding for computer systems , 1989 .
[58] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[59] Eric Rotenberg,et al. Slipstream processors: improving both performance and fault tolerance , 2000, SIGP.
[60] Balaram Sinharoy,et al. POWER4 system microarchitecture , 2002, IBM J. Res. Dev..
[61] Ravi Rajwar,et al. FREE EXECUTION OF LOCK-BASED PROGRAMS , 2002 .
[62] Livio Ricciulli,et al. The detection and elimination of useless misses in multiprocessors , 1993, ISCA '93.
[63] Josep Torrellas,et al. Speculative synchronization: applying thread-level speculation to explicitly parallel applications , 2002, ASPLOS X.
[64] Alan J. Hu,et al. Automatable Verification of Sequential Consistency , 2003, Theory of Computing Systems.
[65] Mikko H. Lipasti,et al. Constraint graph analysis of multithreaded programs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[66] Shubhendu S. Mukherjee,et al. Using prediction to accelerate coherence protocols , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[67] M. Martonosi,et al. Timekeeping in the memory system: predicting and optimizing memory behavior , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[68] Brad Calder,et al. Value profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[69] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.
[70] D. Marr,et al. Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .
[71] Josep Torrellas,et al. Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[72] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[73] John Cocke,et al. A methodology for the real world , 1981 .
[74] John Cocke,et al. Register Allocation Via Coloring , 1981, Comput. Lang..
[75] James L. Walsh,et al. IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..
[76] Manoj Franklin,et al. The multiscalar architecture , 1993 .
[77] R. P. Colwell,et al. A 0.6 /spl mu/m BiCMOS processor with dynamic execution , 1995, Proceedings ISSCC '95 - International Solid-State Circuits Conference.
[78] Cameron McNairy,et al. Itanium 2 Processor Microarchitecture , 2003, IEEE Micro.
[79] Babak Falsafi,et al. Memory sharing predictor: the key to a speculative coherent DSM , 1999, ISCA.
[80] B. Falsafi,et al. Selective, accurate, and timely self-invalidation using last-touch prediction , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[81] Gurindar S. Sohi,et al. Master/Slave Speculative Parallelization , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[82] G. Tyson,et al. Eager writeback-a technique for improving bandwidth utilization , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[83] Keith Diefendorff. K7 Challenges Intel: 10/26/98 , 1998 .
[84] Stefanos Kaxiras,et al. Improving CC-NUMA performance using Instruction-based Prediction , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[85] Milo M. K. Martin,et al. Timestamp snooping: an approach for extending SMPs , 2000, ASPLOS.
[86] David A. Wood,et al. Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[87] David A. Patterson,et al. Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .
[88] Michel Dubois,et al. Essential Misses and Data Traffic in Coherence Protocols , 1995, J. Parallel Distributed Comput..
[89] Erik Hagersten,et al. Race-Free Interconnection Networks and Multiprocessor Consistency , 1991, ISCA.
[90] David J. Lilja,et al. Toward Complexity-Effective Verification: A Case Study of the Cray SV2 Cache Coherence Protocol , 2000 .
[91] Mikko H. Lipasti,et al. Temporally silent stores , 2002, ASPLOS X.
[92] Anoop Gupta,et al. Cache Invalidation Patterns in Shared-Memory Multiprocessors , 1992, IEEE Trans. Computers.
[93] Michel Dubois,et al. Delayed consistency and its effects on the miss rate of parallel programs , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).