Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors
暂无分享,去创建一个
M. Martonosi | M.D. Smith | T.C. Mowry | M. Horowitz | T. Mowry | M. Martonosi | M. Horowitz | Michael D. Smith
[1] Richard P. Paul. Sparc Architecture, Assembly Language Programming, and C , 1993 .
[2] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[3] K.M. Dixit. New CPU benchmark suites from SPEC , 1992, Digest of Papers COMPCON Spring 1992.
[4] Seth Copen Goldstein,et al. Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.
[5] Michael C. Browne,et al. The S3.mp scalable shared memory multiprocessor , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.
[6] Scott A. Mahlke,et al. Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.
[7] James R. Larus,et al. Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.
[8] James Arthur Kohl,et al. A Tool to Aid in the Design, Implementation, and Understanding of Matrix Algorithms for Parallel Processors , 1990, J. Parallel Distributed Comput..
[9] Allan Porterfield,et al. The Tera computer system , 1990 .
[10] António de Brito Ferrari. Sparc® architecture, assembly language programming, & C : Richard P Paul Prentice-Hall Inc, Englewood Cliffs, NJ, USA (1994) ISBN 0 13 876889 7, £34.75, 448 pp , 1995, Microprocess. Microsystems.
[11] Margaret Martonosi,et al. Tuning Memory Performance of Sequential and Parallel Programs , 1995, Computer.
[12] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[13] Donald Yeung,et al. Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.
[14] Susan J. Eggers,et al. The effectiveness of multiple hardware contexts , 1994, ASPLOS VI.
[15] Brian N. Bershad,et al. Avoiding conflict misses dynamically in large direct-mapped caches , 1994, ASPLOS VI.
[16] Norman P. Jouppi,et al. Complexity/performance tradeoffs with non-blocking loads , 1994, ISCA '94.
[17] Ruben W. Castelino,et al. Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..
[18] David A. Wood,et al. Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.
[19] Anoop Gupta,et al. The Stanford FLASH multiprocessor , 1994, ISCA '94.
[20] Anoop Gupta,et al. Interleaving: a multithreading technique targeting multiprocessors and workstations , 1994, ASPLOS VI.
[21] Ricardo Bianchini,et al. The MIT Alewife machine: architecture and performance , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[22] David J. Kuck,et al. Automatic program transformations for virtual memory computers , 1899 .
[23] Helmar Burkhart,et al. Performance-Measurement Tools in a Multiprocessor Environment , 1989, IEEE Trans. Computers.
[24] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[25] W. ABU-SUFAH,et al. Automatic program transformations for virtual memory computers * , 1899, 1979 International Workshop on Managing Requirements Knowledge (MARK).
[26] Seth Copen Goldstein,et al. Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[27] William Jalby,et al. Impact of Hierarchical Memory Systems On Linear Algebra Algorithm Design , 1988 .
[28] Margaret Martonosi,et al. Informing Loads: Enabling Software to Observe and React to Memory Behavior , 1995 .
[29] Kai Li,et al. Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.
[30] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[31] Anoop Gupta,et al. Scheduling and page migration for multiprocessor compute servers , 1994, ASPLOS VI.
[32] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[33] James R. Larus,et al. Tempest and typhoon: user-level shared memory , 1994, ISCA '94.
[34] Ashok Singhal,et al. Architectural support for performance tuning: a case study on the SPARCcenter 2000 , 1994, ISCA '94.
[35] Henry M. Levy,et al. Hardware and software support for efficient exception handling , 1994, ASPLOS VI.
[36] John L. Hennessy,et al. Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications , 1993, IEEE Trans. Parallel Distributed Syst..
[37] K. ReinhardtS.,et al. Tempest and typhoon , 1994 .
[38] A. Childs,et al. Assembly language programming , 1979, Proceedings of the IEEE.
[39] Burton J. Smith. Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.
[40] Brian Case,et al. SPARC architecture , 1992 .
[41] Donald Yeung,et al. The MIT Alewife machine: architecture and performance , 1995, ISCA '98.
[42] J. Robert Jump,et al. The rice parallel processing testbed , 1988, SIGMETRICS '88.