Tools and techniques for memory system design and analysis
暂无分享,去创建一个
[1] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[2] A. Gilles,et al. The Art of Computer Systems Performance Analysis (Techniques for Experimental Design, Measurement, Simulation, and Modeling) , 1992 .
[3] David A. Wood,et al. A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.
[4] Alan L. Cox,et al. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.
[5] Michel Dubois,et al. Access ordering and coherence in shared memory multiprocessors , 1989 .
[6] T. von Eicken,et al. Parallel programming in Split-C , 1993, Supercomputing '93.
[7] Dionisios N. Pnevmatikatos,et al. Cache performance of the integer SPEC benchmarks on a RISC , 1990, CARN.
[8] James R. Larus,et al. EEL: machine-independent executable editing , 1995, PLDI '95.
[9] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.
[10] James R. Larus,et al. Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.
[11] S. B. Prakash,et al. of Electrical and Computer Engineering , 1984 .
[12] Calvin K. Tang. Cache system design in the tightly coupled multiprocessor system , 1976, AFIPS '76.
[13] Erik Hagersten,et al. DDM - A Cache-Only Memory Architecture , 1992, Computer.
[14] Ben Zorn,et al. A memory allocation profiler for c and lisp , 1988 .
[15] Margaret Martonosi,et al. MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.
[16] James R. Larus,et al. Mechanisms for cooperative shared memory , 1993, ISCA '93.
[17] Gregory R. Andrews,et al. Distributed filaments: efficient fine-grain parallelism on a cluster of workstations , 1994, OSDI '94.
[18] M. Hill,et al. Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[19] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[20] Alan Jay Smith,et al. Two Methods for the Efficient Analysis of Memory Address Trace Data , 1977, IEEE Transactions on Software Engineering.
[21] Pen-Chung Yew,et al. A compiler-directed cache coherence scheme with improved intertask locality , 1994, Proceedings of Supercomputing '94.
[22] Ann Marie Grizzaffi Maynard,et al. Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.
[23] Robert J. Fowler,et al. Adaptive cache coherency for detecting migratory shared data , 1993, ISCA '93.
[24] Norman P. Jouppi,et al. Tradeoffs in two-level on-chip caching , 1994, ISCA '94.
[25] Alan Jay Smith,et al. Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.
[26] Mats Brorsson,et al. An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.
[27] John L. Hennessy,et al. Multiprocessor Simulation and Tracing Using Tango , 1991, ICPP.
[28] James R. Larus,et al. Cachier: A Tool for Automatically Inserting CICO Annotations , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.
[29] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[30] Trevor York,et al. Book Review: Principles of CMOS VLSI Design: A Systems Perspective , 1986 .
[31] James R. Larus,et al. The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.
[32] S. Abraham,et al. Eecient Simulation of Multiple Cache Conngurations Using Binomial Trees , 1991 .
[33] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.
[34] Thomas Roberts Puzak,et al. Analysis of cache replacement-algorithms , 1985 .
[35] James R. Larus,et al. Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.
[36] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[37] James R. Larus,et al. Tempest and typhoon: user-level shared memory , 1994, ISCA '94.
[38] Alan Jay Smith,et al. Line (Block) Size Choice for CPU Cache Memories , 1987, IEEE Transactions on Computers.
[39] Anant Agarwal,et al. LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.
[40] Mark Horowitz,et al. An evaluation of directory schemes for cache coherence , 1998, ISCA '98.
[41] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[42] Alexander V. Veidenbaum,et al. Compiler-directed cache management in multiprocessors , 1990, Computer.
[43] Henry M. Levy,et al. Hardware and software support for efficient exception handling , 1994, ASPLOS VI.
[44] John H. Edmondson,et al. Superscalar instruction execution in the 21164 Alpha microprocessor , 1995, IEEE Micro.
[45] James R. Larus,et al. Efficient program tracing , 1993, Computer.
[46] Richard E. Kessler,et al. Page placement algorithms for large real-indexed caches , 1992, TOCS.
[47] Anoop Gupta,et al. Comparative evaluation of latency reducing and tolerating techniques , 1991, ISCA '91.
[48] David A. Wood,et al. Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.
[49] Anoop Gupta,et al. The Stanford FLASH multiprocessor , 1994, ISCA '94.
[50] Margaret Martonosi,et al. Effectiveness of trace sampling for performance debugging tools , 1993, SIGMETRICS '93.
[51] David W. Wall,et al. Generation and analysis of very long address traces , 1990, ISCA '90.
[52] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[53] Raj Jain,et al. The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.
[54] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[55] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[56] Thomas E. Anderson,et al. The Performance Implications of Spin-Waiting Alternatives for Shared-Memory Multiprocessors , 1989, ICPP.
[57] Susan L. Graham,et al. Gprof: A call graph execution profiler , 1982, SIGPLAN '82.
[58] John L. Hennessy,et al. Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications , 1993, IEEE Trans. Parallel Distributed Syst..
[59] K. Kavi. Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .
[60] David A. Wood,et al. Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[61] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.
[62] Peter Yan-Tek Hsu. Designing the TFP microprocessor , 1994, IEEE Micro.
[63] David B. Whalley,et al. Fast instruction cache performance evaluation using compile-time analysis , 1992, SIGMETRICS '92/PERFORMANCE '92.
[64] E AndersonThomas,et al. Efficient software-based fault isolation , 1993 .
[65] David A. Wood,et al. Active memory: a new abstraction for memory-system simulation , 1995, SIGMETRICS '95/PERFORMANCE '95.
[66] Dionisios N. Pnevmatikatos,et al. Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.
[67] Ken Chan,et al. PA7200: a PA-RISC processor with integrated high performance MP bus interface , 1994, Proceedings of COMPCON '94.
[68] Sang Lyul Min,et al. Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps , 1992, IEEE Trans. Parallel Distributed Syst..
[69] Mark D. Hill,et al. Implementing Sequential Consistency in Cache-Based Systems , 1990, ICPP.
[70] K. Kennedy,et al. Cache coherence using local knowledge , 1993, Supercomputing '93.
[71] Kevin P. McAuliffe,et al. Automatic Management of Programmable Caches , 1988, ICPP.
[72] Trevor N. Mudge,et al. Trap-driven simulation with Tapeworm II , 1994, ASPLOS VI.
[73] Robert Wahbe,et al. Efficient software-based fault isolation , 1994, SOSP '93.
[74] Babak Falsafi,et al. Kernel Support for the Wisconsin Wind Tunnel , 1993, USENIX Microkernels and Other Kernel Architectures Symposium.
[75] Michel Dubois,et al. Memory access buffering in multiprocessors , 1998, ISCA '98.
[76] W. Kent Fuchs,et al. TRAPEDS: producing traces for multicomputers via execution driven simulation , 1989, SIGMETRICS '89.
[77] Michel Dubois,et al. Combined performance gains of simple cache protocol extensions , 1994, ISCA '94.
[78] James R. Larus,et al. LCM: memory system support for parallel language implementation , 1994, ASPLOS VI.
[79] David Keppel,et al. Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.
[80] Burton J. Smith. Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.
[81] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[82] Jeffrey F. Naughton,et al. Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.
[83] Andrea C. Arpaci-Dusseau,et al. Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.
[84] David E. Culler,et al. A case for NOW (networks of workstation) , 1995, PODC '95.