Informing memory operations: memory performance feedback mechanisms and their applications
暂无分享,去创建一个
Margaret Martonosi | Michael D. Smith | Mark Horowitz | Todd C. Mowry | T. Mowry | M. Martonosi | M. Horowitz | Michael D. Smith
[1] Kai Li,et al. Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.
[2] James Arthur Kohl,et al. A Tool to Aid in the Design, Implementation, and Understanding of Matrix Algorithms for Parallel Processors , 1990, J. Parallel Distributed Comput..
[3] Ruben W. Castelino,et al. Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..
[4] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.
[5] Burton J. Smith. Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.
[6] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[7] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[8] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.
[9] Brian Case,et al. SPARC architecture , 1992 .
[10] Donald Yeung,et al. The MIT Alewife machine: architecture and performance , 1995, ISCA '98.
[11] James R. Larus,et al. Tempest and typhoon: user-level shared memory , 1994, ISCA '94.
[12] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[13] W. ABU-SUFAH,et al. Automatic program transformations for virtual memory computers * , 1899, 1979 International Workshop on Managing Requirements Knowledge (MARK).
[14] Ashok Singhal,et al. Architectural support for performance tuning: a case study on the SPARCcenter 2000 , 1994, ISCA '94.
[15] Lance M. Berc,et al. Continuous profiling: where have all the cycles gone? , 1997, TOCS.
[16] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[17] Anoop Gupta,et al. Scheduling and page migration for multiprocessor compute servers , 1994, ASPLOS VI.
[18] J. Robert Jump,et al. The rice parallel processing testbed , 1988, SIGMETRICS '88.
[19] Donald Yeung,et al. Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.
[20] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[21] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.
[22] Margaret Martonosi,et al. Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors , 1996, ISCA.
[23] P. Gregory,et al. February , 1890, The Hospital.
[24] Lance M. Berc,et al. Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..
[25] Margaret Martonosi,et al. Tuning Memory Performance of Sequential and Parallel Programs , 1995, Computer.
[26] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[27] Ken Kennedy,et al. A Methodology for Procedure Cloning , 1993, Computer languages.
[28] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[29] Susan J. Eggers,et al. The effectiveness of multiple hardware contexts , 1994, ASPLOS VI.
[30] William Jalby,et al. Impact of Hierarchical Memory Systems On Linear Algebra Algorithm Design , 1988 .
[31] Norman P. Jouppi,et al. Complexity/performance tradeoffs with non-blocking loads , 1994, ISCA '94.
[32] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[33] Todd C. Mowry,et al. Predicting data cache misses in non-numeric applications through correlation profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[34] Richard P. Paul. Sparc Architecture, Assembly Language Programming, and C , 1993 .
[35] Allan Porterfield,et al. The Tera computer system , 1990 .
[36] Margaret Martonosi,et al. Informing Loads: Enabling Software to Observe and React to Memory Behavior , 1995 .
[37] Jeffrey Dean,et al. ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[38] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[39] Michael D. Smith,et al. Support for Speculative Execution in High-Performance Processors , 1992 .
[40] Michael C. Browne,et al. The S3.mp scalable shared memory multiprocessor , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.
[41] David A. Wood,et al. Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.
[42] Brian N. Bershad,et al. Avoiding conflict misses dynamically in large direct-mapped caches , 1994, ASPLOS VI.
[43] Steven W. K. Tjiang,et al. Sharlit—a tool for building optimizers , 1992, PLDI '92.
[44] Allan Porterfield,et al. The Tera computer system , 1990, ICS '90.
[45] James R. Larus,et al. Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.
[46] Anoop Gupta,et al. Interleaving: a multithreading technique targeting multiprocessors and workstations , 1994, ASPLOS VI.
[47] John L. Hennessy,et al. Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications , 1993, IEEE Trans. Parallel Distributed Syst..
[48] K.M. Dixit. New CPU benchmark suites from SPEC , 1992, Digest of Papers COMPCON Spring 1992.
[49] Helmar Burkhart,et al. Performance-Measurement Tools in a Multiprocessor Environment , 1989, IEEE Trans. Computers.