Informing Loads: Enabling Software to Observe and React to Memory Behavior
暂无分享,去创建一个
Margaret Martonosi | Michael D. Smith | Mark Horowitz | Todd C. Mowry | T. Mowry | M. Martonosi | Michaela Smith | M. Horowitz | Michael D. Smith | Michael D. Smith
[1] R. Dreisbach,et al. STANFORD UNIVERSITY. , 1914, Science.
[2] A. C. McKellar,et al. The organization of matrices and matrix operations in a paged multiprogramming environment , 1968 .
[3] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[4] W. ABU-SUFAH,et al. Automatic program transformations for virtual memory computers * , 1899, 1979 International Workshop on Managing Requirements Knowledge (MARK).
[5] A. Childs,et al. Assembly language programming , 1979, Proceedings of the IEEE.
[6] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.
[7] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[8] Gene H. Golub,et al. Matrix computations , 1983 .
[9] William Jalby,et al. Impact of Hierarchical Memory Systems On Linear Algebra Algorithm Design , 1988 .
[10] Robert J. Fowler,et al. The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum , 1989, SOSP '89.
[11] Michael L. Scott,et al. Simple but effective techniques for NUMA memory management , 1989, SOSP '89.
[12] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[13] Helmar Burkhart,et al. Performance-Measurement Tools in a Multiprocessor Environment , 1989, IEEE Trans. Computers.
[14] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[15] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[16] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[17] Scott A. Mahlke,et al. Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.
[18] Carla Schlatter Ellis,et al. Experimental comparison of memory management policies for NUMA multiprocessors , 1991, TOCS.
[19] Michael D. Smith,et al. Tracing with Pixie , 1991 .
[20] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[21] Richard E. Kessler,et al. Page placement algorithms for large real-indexed caches , 1992, TOCS.
[22] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[23] Burton M. Leary,et al. A 200 MHz 64 b dual-issue CMOS microprocessor , 1992, 1992 IEEE International Solid-State Circuits Conference Digest of Technical Papers.
[24] Michael D. Smith,et al. Support for Speculative Execution in High-Performance Processors , 1992 .
[25] Brian Case,et al. SPARC architecture , 1992 .
[26] R. L. Stewart,et al. The Design of the DEC 3000 AXP Systems, Two High-performance Workstations , 1992, Digit. Tech. J..
[27] K.M. Dixit. New CPU benchmark suites from SPEC , 1992, Digest of Papers COMPCON Spring 1992.
[28] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[29] James R. Larus,et al. The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.
[30] John L. Hennessy,et al. Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications , 1993, IEEE Trans. Parallel Distributed Syst..
[31] Richard P. Paul. Sparc Architecture, Assembly Language Programming, and C , 1993 .
[32] Anoop Gupta,et al. The Stanford FLASH multiprocessor , 1994, ISCA '94.
[33] Margaret Martonosi,et al. Analyzing and tuning memory performance in sequential and parallel programs , 1994 .
[34] Anoop Gupta,et al. Interleaving: a multithreading technique targeting multiprocessors and workstations , 1994, ASPLOS VI.
[35] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[36] David A. Wood,et al. Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.
[37] Brian N. Bershad,et al. Avoiding conflict misses dynamically in large direct-mapped caches , 1994, ASPLOS VI.
[38] Anoop Gupta,et al. Scheduling and page migration for multiprocessor compute servers , 1994, ASPLOS VI.
[39] James P. Laudon,et al. Architectural and Implementation Tradeoffs for Multiple-Context Processors , 1995 .