Memory-system Design Considerations For Dynamically-scheduled Processors
暂无分享,去创建一个
P. Chow | N.P. Jouppi | K.I. Farkas | Z. Vranesic | Z. Vranesic | N. Jouppi | K. Farkas | P. Chow
[1] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[2] K. K. Ramakrishnan,et al. Eliminating receive livelock in an interrupt-driven kernel , 1996, TOCS.
[3] David W. Wall,et al. A practical system fljr intermodule code optimization at link-time , 1993 .
[4] Van P. Carey,et al. Pool Boiling on Small Heat Dissipating Elements in Water at Subatmospheric Pressure , 1999 .
[5] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.
[6] Ruben W. Castelino,et al. Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..
[7] Norman P. Jouppi,et al. Memory-System Design Considerations for Dynamically-Scheduled Processors , 1997, ISCA.
[8] Joel F. Bartlett,et al. Compacting garbage collection with ambiguous roots , 1988, LIPO.
[9] Scott McFarling,et al. Procedure merging with instruction caches , 1991, PLDI '91.
[10] P. Boyle. Electrical Evaluation Of The BIPS-0 Package , 1999 .
[11] Joel F. Bartlett,et al. Mostly-Copying Garbage Collection Picks Up Generations and C++ , 1999 .
[12] J. Mogul,et al. Characterization of Organic Illumination Systems , 1989 .
[13] Kourosh Gharachorloo,et al. Fine-grain software distributed shared memory on SMP clusters , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[14] S. Peter Song,et al. The PowerPC 604 RISC microprocessor. , 1994, IEEE Micro.
[15] Norman P. Jouppi,et al. How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors? , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[16] Norman P. Jouppi,et al. A simulation based study of TLB performance , 1992, ISCA '92.
[17] B. K. Reid,et al. The USENET cookbook—an experiment in electronic , 1989 .
[18] Harry Dwyer,et al. An out-of-order superscalar processor with speculative execution and fast, precise interrupts , 1992, MICRO 1992.
[19] William R. Hamburgen,et al. Optimal Finned Heat Sinks , 1986 .
[20] Paul John Asente,et al. Editing graphical objects using procedural representations , 1988 .
[21] Amitabh Srivastava,et al. Unreachable procedures in object-oriented programming , 1992, LOPL.
[22] Jeffrey C. Mogul,et al. Measured capacity of an Ethernet: myths and reality , 1988, CCRV.
[23] S. McFarling. Combining Branch Predictors , 1993 .
[24] David W. Wall,et al. Long Address Traces from RISC Machines: Generation and Analysis , 1999, ISCA 1989.
[25] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[26] Silvio Turrini,et al. Optimal group distribution in carry-skip adders , 1989, Proceedings of 9th Symposium on Computer Arithmetic.
[27] Silvio Turrini. Optimizations and Placement with the Genetic Workbench , 1999 .
[28] John L. Hennessy,et al. The priority-based coloring approach to register allocation , 1990, TOPL.
[29] Joel F. Bartlett,et al. Transparent Controls for Interactive Graphics , 1999 .
[30] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[31] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[32] Scott McFarling. Cache replacement with dynamic exclusion , 1992, ISCA '92.
[33] Thomas Lengauer,et al. Combinatorial algorithms for integrated circuit layout , 1990, Applicable theory in computer science.
[34] Preston Briggs,et al. Register allocation via graph coloring , 1992 .
[35] Jeffrey C. Mogul,et al. The experimental literature of the internet: an annotated bibliography , 1989, CCRV.
[36] David F. Bacon,et al. Compiler transformations for high-performance computing , 1994, CSUR.
[37] Norman P. Jouppi,et al. Tradeoffs in two-level on-chip caching , 1994, ISCA '94.
[38] Jeffrey C. Mogul,et al. Performance Implications of Multiple Pointer Sizes , 1995, USENIX.
[39] Jeffrey C. Mogul,et al. The effect of context switches on cache performance , 1991, ASPLOS IV.
[40] John K. Ousterhout,et al. Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.
[41] Norman P. Jouppi,et al. WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .
[42] J. S. Fitch,et al. A comparison of acoustic and infrared inspection techniques for die attach , 1992, [1992 Proceedings] Intersociety Conference on Thermal Phenomena in Electronic Systems.
[43] Jeffrey C. Mogul,et al. Network locality at the scale of processes , 1991, SIGCOMM '91.
[44] Deborah Estrin,et al. Visa Protocols for Controlling Inter-Organizational Datagram Flow : Extended Description , 1989 .
[45] John L. Hennessy,et al. MTOOL: a method for detecting memory bottlenecks , 1991, SIGMETRICS '91.
[46] G. May Yip. Incremental, Generational Mostly-Copying Garbage Collection in Uncooperative Environments , 1999 .
[47] W. Hamburgen,et al. Pool boiling enhancement techniques for water at low pressure , 1991, 1991 Proceedings, Seventh IEEE Semiconductor Thermal Measurement and Management Symposium.
[48] Mark Horowitz,et al. Piecewise linear models for Rsim , 1993, ICCAD.
[49] David W. Wall,et al. Speculative Execution and Instruction-Level Parallelism , 1999 .
[50] Guang R. Gao,et al. A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs , 1992, CC.
[51] Jeremy Dion,et al. Contour: a tile-based gridless router , 1995 .
[52] Robert N. Mayo,et al. Boolean matching for full-custom ECL gates , 1993, ICCAD '93.
[53] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[54] Richard L. Sites,et al. Alpha AXP architecture , 1993, CACM.
[55] Dirk Grunwald,et al. Reducing branch costs via branch alignment , 1994, ASPLOS VI.
[56] David W. Wall,et al. Systems for Late Code Modification , 1991, Code Generation.
[57] Jeffrey C. Mogul,et al. Operating systems support for busy Internet servers , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).
[58] Norman P. Jouppi,et al. Circuit and Process Directions for Low-Voltage Swing Submicron BiCMOS , 1999 .
[59] Brad Calder,et al. Efficient procedure mapping using cache line coloring , 1997, PLDI '97.
[60] Norman P. Jouppi,et al. Register file design considerations in dynamically scheduled processors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[61] J. S. Liptay. Design of the IBM Enterprise System/9000 high-end processor , 1992, IBM J. Res. Dev..
[62] Joel F. Bartlett,et al. Ramonamap—an example of graphical groupware , 1994, UIST '94.
[63] Jeffrey C. Mogul,et al. The case for persistent-connection HTTP , 1995, SIGCOMM '95.
[64] Mark Smotherman,et al. Efficient DAG construction and heuristic calculation for instruction scheduling , 1991, MICRO 24.
[65] W. R. Hamburgen,et al. Precise robotic paste dot dispensing , 1989, Proceedings., 39th Electronic Components Conference.
[66] Jeffrey C. Mogul,et al. Simple and Flexible Datagram Access Controls for UNIX-based Gateways , 1999 .
[67] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.
[68] Kourosh Gharachorloo,et al. Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.
[69] Joel F. Bartlett,et al. Don’t Fidget with Widgets, Draw! , 1999 .
[70] Gurindar S. Sohi,et al. Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO 1992.
[71] Jeffrey C. Mogul,et al. Observing TCP dynamics in real networks , 1992, SIGCOMM '92.
[72] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[73] Kourosh Gharachorloo,et al. Design and performance of the Shasta distributed shared memory protocol , 1997, ICS '97.
[74] W. Hamburgen,et al. Packaging a 150-W bipolar ECL microprocessor , 1992, 1992 Proceedings 42nd Electronic Components & Technology Conference.
[75] Richard L. Sites,et al. Alpha Architecture Reference Manual , 1995 .
[76] Jeffrey C. Mogul,et al. Efficient use of workstations for passive monitoring of local area networks , 1990, SIGCOMM '90.
[77] Charles N. Fischer,et al. Probabilistic register allocation , 1992, PLDI '92.
[78] PA-8000 Combines Complexity and Speed: 11/14/94 , 1994 .
[79] Yale N. Patt,et al. An investigation of the performance of various dynamic scheduling techniques , 1992, MICRO 1992.
[80] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[81] Jeremy Dion,et al. Fast Printed Circuit Board Routing , 1987, 24th ACM/IEEE Design Automation Conference.
[82] Dirk Grunwald,et al. The predictability of branches in libraries , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[83] Joel F. Bartlett,et al. Experience with a wireless world wide web client , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.
[84] Mike Johnson,et al. Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.
[85] Jeffrey C. Mogul,et al. The packer filter: an efficient mechanism for user-level network code , 1987, SOSP '87.
[86] K. J. Richardson. Component Characterization for I / O Cache Designs , 1995 .
[87] Norman P. Jouppi,et al. Complexity/performance tradeoffs with non-blocking loads , 1994, ISCA '94.
[88] Russell Kao,et al. Piecewise Linear Models for Switch-Level Simulation , 1992 .
[89] Rajiv Gupta,et al. Predictability of load/store instruction latencies , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.
[90] David W. Wall,et al. Global register allocation at link time , 1986, SIGPLAN '86.
[91] David W. Wall,et al. Software Methods for System Address Tracing: Implementation and Validation , 1999 .
[92] Joel McCormack,et al. Writing fast X servers for dumb color frame buffers , 1990, Softw. Pract. Exp..
[93] Jeffrey C. Mogul,et al. Network Behavior of a Busy Web Server and its Clients , 1999 .
[94] John Fitch,et al. A One-Dimensional Thermal Model for the VAX 9000 Multi Chip Units , 1990 .
[95] N. P. Jouppi,et al. Integration and packaging plateaus of processor performance , 1989, Proceedings 1989 IEEE International Conference on Computer Design: VLSI in Computers and Processors.
[96] Michael N. Nelson,et al. Virtual Memory vs. The File System , 1999 .
[97] John Cocke,et al. A methodology for the real world , 1981 .
[98] David W. Wall,et al. Link-time optimization of address calculation on a 64-bit architecture , 1994, PLDI '94.
[99] David W. Wall,et al. The Mahler experience: using an intermediate language as the machine description , 1987, International Conference on Architectural Support for Programming Languages and Operating Systems.
[100] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[101] N. P. Jouppi. Architectural and organizational tradeoffs in the design of the MultiTitan CPU , 1989, ISCA '89.
[102] Dean M. Tullsen,et al. Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.
[103] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[104] Keith D. Cooper,et al. Improvements to graph coloring register allocation , 1994, TOPL.
[105] Scott A. Mahlke,et al. IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, ISCA '91.
[106] Dave Christie. Developing the AMD-K5 architecture , 1996, IEEE Micro.
[107] Vicki H. Allan,et al. Software pipelining , 1995, CSUR.
[108] Norman P. Jouppi,et al. The Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance , 1999 .
[109] Christopher A. Kent,et al. Cache Coherence in Distributed Systems , 1999 .
[110] Janak H. Patel,et al. Stride directed prefetching in scalar processors , 1992, MICRO.
[111] Jeffrey Mogul,et al. Spritely NFS: Implementation and Performance of Cache-Consistency Protocols , 1989 .
[112] David W. Wall,et al. Experience with a software-defined machine architecture , 1992, TOPL.
[113] Jack L. Lo,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[114] A. Gibbons. Algorithmic Graph Theory , 1985 .
[115] K. Gharachodoo,et al. Memory consistency models for shared memory multiprocessors , 1996 .
[116] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.
[117] Jean-Loup Baer,et al. Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.
[118] Ramsey W. Haddad. Drip: A Schematic Drawing Interpreter , 1999 .
[119] Norman P. Jouppi,et al. Available instruction-level parallelism for superscalar and superpipelined machines , 1989, ASPLOS III.
[120] David W. Wall,et al. Link-Time Code Modification , 1989 .
[121] N. P. Jouppi,et al. A 20-MIPS sustained 32-bit CMOS microprocessor with high ratio of sustained to peak performance , 1989 .
[122] Norman P. Jouppi. Cache write policies and performance , 1993, ISCA '93.
[123] Susan J. Eggers,et al. The effect on RISC performance of register set size and structure versus code generation strategy , 1991, ISCA '91.
[124] Amitabh Srivastava,et al. Analysis Tools , 2019, Public Transportation Systems.
[125] David W. Wall,et al. Predicting program behavior using real or estimated profiles , 2004, SIGP.
[126] Don Stark,et al. Analysis of power supply networks in VLSI circuits , 1991 .
[127] Susan J. Eggers,et al. Balanced scheduling: instruction scheduling when memory latency is uncertain , 1993, PLDI '93.