Microprocessor Architecture: From Simple Pipelines to Chip Multiprocessors
暂无分享,去创建一个
[1] Michel Dubois,et al. Memory access buffering in multiprocessors , 1998, ISCA '98.
[2] Brad Calder,et al. Dynamic prediction of critical path instructions , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[3] Zarka Cvetanovic,et al. Performance characterization of the Alpha 21164 microprocessor using TP and SPEC workloads , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[4] Andrew R. Pleszkun,et al. Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.
[5] Peter J. Denning. Virtual Memory , 1996, ACM Comput. Surv..
[6] Trevor N. Mudge,et al. Trace-driven memory simulation: a survey , 1997, CSUR.
[7] T. Lovett,et al. STiNG: A CC-NUMA Computer System for the Commercial Marketplace , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[8] Trevor N. Mudge,et al. A comparison of two pipeline organizations , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.
[9] Harsh Sharangpani,et al. Itanium Processor Microarchitecture , 2000, IEEE Micro.
[10] B J Smith,et al. A pipelined, shared resource MIMD computer , 1986 .
[11] James E. Smith,et al. The microarchitecture of superscalar processors , 1995, Proc. IEEE.
[12] Steven K. Reinhardt,et al. A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[13] Hans Mulder,et al. Introducing the IA-64 Architecture , 2000, IEEE Micro.
[14] Manoj Franklin,et al. Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors , 2005, IEEE Trans. Parallel Distributed Syst..
[15] Trevor N. Mudge,et al. The YAGS branch prediction scheme , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[16] Scott Shenker,et al. Scheduling for reduced CPU energy , 1994, OSDI '94.
[17] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[18] Gurindar S. Sohi,et al. Speculative Multithreaded Processors , 2001, Computer.
[19] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[20] B DennisJack,et al. A preliminary architecture for a basic data-flow processor , 1974 .
[21] Norman P. Jouppi,et al. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays , 2002, ISCA.
[22] Ken Mai,et al. The future of wires , 2001, Proc. IEEE.
[23] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[24] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[25] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.
[26] Dirk Grunwald,et al. Next cache line and set prediction , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[27] N PattYale,et al. Alternative implementations of two-level adaptive branch prediction , 1992 .
[28] HuangWei,et al. Temperature-aware microarchitecture , 2003 .
[29] V. Klema. LINPACK user's guide , 1980 .
[30] Shreekant S. Thakkar,et al. The Symmetry Multiprocessor System , 1988, ICPP.
[31] Marc Tremblay,et al. High-performance throughput computing , 2005, IEEE Micro.
[32] James R. Larus,et al. Transactional memory , 2008, CACM.
[33] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.
[34] Dave Christie. Developing the AMD-K5 architecture , 1996, IEEE Micro.
[35] Daniel Citron,et al. The harmonic or geometric mean: does it really matter? , 2006, CARN.
[36] William J. Dally,et al. Programmable Stream Processors , 2003, Computer.
[37] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[38] Frank P.E. Baetke. The CONVEX Exemplar SPP1000 and SPP1200—New Scalable Parallel Systems with a Virtual Shared Memory Architecture , 1995 .
[39] Yale N. Patt,et al. Recovery requirements of branch prediction storage structures in the presence of mispredicted-path execution , 2007, International Journal of Parallel Programming.
[40] David A. Patterson,et al. Computer Organization & Design: The Hardware/Software Interface , 1993 .
[41] M. Tremblay,et al. UltraSparc I: a four-issue processor supporting multimedia , 1996, IEEE Micro.
[42] Yale N. Patt,et al. On pipelining dynamic instruction scheduling logic , 2000, MICRO 33.
[43] Anoop Gupta,et al. Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.
[44] C.B. Stunkel,et al. A New Switch Chip for IBM RS/6000 SP Systems , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[45] Lizy Kurian John,et al. More on finding a single number to indicate overall performance of a benchmark suite , 2004, CARN.
[46] L. W. Tucker,et al. Architecture and applications of the Connection Machine , 1988, Computer.
[47] Michael Franz,et al. Power reduction techniques for microprocessor systems , 2005, CSUR.
[48] James K. Archibald,et al. Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.
[49] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[50] E AndersonThomas,et al. Execution characteristics of desktop applications on Windows NT , 1998 .
[51] Dean M. Tullsen,et al. Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.
[52] Larry Rudolph,et al. Dynamic decentralized cache schemes for mimd parallel processors , 1984, ISCA 1984.
[53] J. S. Liptay,et al. Design of the IBM Enterprise System/9000 high-end processor , 1992, IBM J. Res. Dev..
[54] Thomas E. Anderson,et al. The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..
[55] Robert M. Keller,et al. Look-Ahead Processors , 1975, CSUR.
[56] Marc Tremblay,et al. The MAJC Architecture: A Synthesis of Parallelism and Scalability , 2000, IEEE Micro.
[57] Kevin Skadron,et al. Temperature-aware microarchitecture , 2003, ISCA '03.
[58] Dirk Grunwald,et al. Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[59] Brian N. Bershad,et al. Execution characteristics of desktop applications on Windows NT , 1998, ISCA.
[60] Yale N. Patt,et al. HPSm, a high performance restricted data flow architecture having minimal functionality , 1986, ISCA '98.
[61] Craig B. Zilles,et al. A criticality analysis of clustering in superscalar processors , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[62] Yale N. Patt,et al. A comprehensive instruction fetch mechanism for a processor supporting speculative execution , 1992, MICRO 1992.
[63] David J. Lilja,et al. Data prefetch mechanisms , 2000, CSUR.
[64] Emerson W. Pugh,et al. IBM's 360 and early 370 systems , 1991 .
[65] Trevor N. Mudge,et al. Analysis of branch prediction via data compression , 1996, ASPLOS VII.
[66] William J. Dally. Virtual-channel flow control , 1990, ISCA '90.
[67] Pat Conway,et al. The AMD Opteron Processor for Multiprocessor Servers , 2003, IEEE Micro.
[68] G.S. Sohi,et al. Dynamic Speculation And Synchronization Of Data Dependence , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[69] Maurice V. Wilkes,et al. Slave Memories and Dynamic Storage Allocation , 1965, IEEE Trans. Electron. Comput..
[70] Zhao Zhang,et al. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.
[71] Donald Yeung,et al. The MIT Alewife machine: architecture and performance , 1995, ISCA '98.
[72] Yoichi Muraoka,et al. TRANQUIL: a language for an array processing computer , 1969, AFIPS '69 (Spring).
[73] Dileep Bhandarkar. Alpha implementations and architecture - complete reference and guide , 1996 .
[74] David B. Papworth. Tuning the Pentium Pro microarchitecture , 1996, IEEE Micro.
[75] Trevor N. Mudge,et al. High-Performance DRAMs in Workstation Environments , 2001, IEEE Trans. Computers.
[76] Carl J. Conti,et al. Structural Aspects of the System/360 Model 85 I: General Organization , 1968, IBM Syst. J..
[77] Richard Crisp,et al. Direct RAMbus technology: the new main memory standard , 1997, IEEE Micro.
[78] Allan Hartstein,et al. The optimum pipeline depth for a microprocessor , 2002, ISCA.
[79] Rastislav Bodík,et al. Slack: maximizing performance under technological constraints , 2002, ISCA.
[80] J. E. Thornton,et al. Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).
[81] Mark D. Hill,et al. Multiprocessors Should Support Simple Memory-Consistency Models , 1998, Computer.
[82] Monica S. Lam,et al. RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .
[83] Janak H. Patel,et al. A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.
[84] Michael J. Flynn,et al. Very high-speed computing systems , 1966 .
[85] David A. Koufaty,et al. Hyperthreading Technology in the Netburst Microarchitecture , 2003, IEEE Micro.
[86] Richard E. Kessler,et al. Performance analysis of the Alpha 21264-based Compaq ES40 system , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[87] S. F. Anderson,et al. The IBM system/360 model 91: floating-point execution unit , 1967 .
[88] R. E. Kessler,et al. Inexpensive implementations of set-associativity , 1989, ISCA '89.
[89] Margaret Martonosi,et al. Speculative Updates of Local and Global Branch History: A Quantitative Analysis , 2000, J. Instr. Level Parallelism.
[90] Brian A. Wichmann,et al. A Synthetic Benchmark , 1976, Comput. J..
[91] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[92] Barry Fagin,et al. Partial resolution in branch target buffers , 1995, MICRO 1995.
[93] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[94] Trevor Mudge,et al. Drowsy instruction caches. Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[95] Dirk Grunwald,et al. Fast and accurate instruction fetch and branch prediction , 1994, ISCA '94.
[96] Edward M. Riseman,et al. The Inhibition of Potential Parallelism by Conditional Jumps , 1972, IEEE Transactions on Computers.
[97] Michael J. Flynn,et al. Detection and Parallel Execution of Independent Instructions , 1970, IEEE Transactions on Computers.
[98] Dirk Grunwald,et al. A stateless, content-directed data prefetching mechanism , 2002, ASPLOS X.
[99] Wen-Hann Wang,et al. On the inclusion properties for multi-level cache hierarchies , 1988, ISCA '88.
[100] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[101] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .
[102] Mikko H. Lipasti,et al. Modern Processor Design: Fundamentals of Superscalar Processors , 2002 .
[103] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.
[104] Alan Jay Smith,et al. Multimedia extensions for general purpose microprocessors: a survey , 2005, Microprocess. Microsystems.
[105] Susan J. Eggers,et al. An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.
[106] J. E. Thornton. Design of a Computer: The Control Data 6600 , 1970 .
[107] Babak Falsafi,et al. Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.
[108] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[109] James Archibald,et al. An economical solution to the cache coherence problem , 1984, ISCA 1984.
[110] Avi Mendelson,et al. CMP Implementation in Systems Based on the Intel Core Duo Processor , 2006 .
[111] KubiatowiczJohn,et al. The MIT Alewife machine , 1995 .
[112] Alon Naveh,et al. Power and Thermal Management in the Intel Core Duo Processor , 2006 .
[113] Steven R. Kunkel,et al. A multithreaded PowerPC processor for commercial servers , 2000, IBM J. Res. Dev..
[114] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .
[115] Shreekant S. Thakkar,et al. Synchronization algorithms for shared-memory multiprocessors , 1990, Computer.
[116] Kanad Ghose,et al. Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[117] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[118] Peter M. Kogge,et al. The Architecture of Pipelined Computers , 1981 .
[119] Jean-Loup Baer,et al. Modified LRU policies for improving second-level cache behavior , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[120] James E. Smith,et al. A study of branch prediction strategies , 1981, ISCA '98.
[121] Francis F. Lee,et al. Study of "Look-Aside" Memory , 1969, IEEE Transactions on Computers.
[122] Scott A. Mahlke,et al. Integrated predicated and speculative execution in the IMPACT EPIC architecture , 1998, ISCA.
[123] John H. Edmondson,et al. Superscalar instruction execution in the 21164 Alpha microprocessor , 1995, IEEE Micro.
[124] Leonard Kleinrock,et al. Virtual Cut-Through: A New Computer Communication Switching Technique , 1979, Comput. Networks.
[125] Alan Jay Smith,et al. Functional Implementation Techniques for CPU Cache Memories , 1999, IEEE Trans. Computers.
[126] Alan Jay Smith,et al. Aspects of cache memory and instruction buffer performance , 1987 .
[127] Thomas M. Conte,et al. Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[128] Jack W. Davidson,et al. Profile guided code positioning , 1990, SIGP.
[129] A. J. KleinOsowski,et al. MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.
[130] Cameron McNairy,et al. Itanium 2 Processor Microarchitecture , 2003, IEEE Micro.
[131] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[132] Chris Wilkerson,et al. Locality vs. criticality , 2001, ISCA 2001.
[133] James E. Smith,et al. Characterizing computer performance with a single number , 1988, CACM.
[134] Anant Agarwal,et al. Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.
[135] Antonio González,et al. Energy-effective issue logic , 2001, ISCA 2001.
[136] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[137] Joseph T. Rahmeh,et al. Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.
[138] D.R. Kaeli,et al. Branch history table prediction of moving target branches due to subroutine returns , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.
[139] Yale N. Patt,et al. Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.
[140] Mary K. Vernon,et al. Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS 1989.
[141] Andris Padegs,et al. Architecture of the IBM system/370 , 1978, CACM.
[142] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[143] David W. Anderson,et al. The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .
[144] Tom Kilburn,et al. One-Level Storage System , 1962, IRE Trans. Electron. Comput..
[145] RonenRonny,et al. Speculation techniques for improving load related instruction scheduling , 1999 .
[146] Susan J. Eggers,et al. Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.
[147] Rajeev Balasubramonian,et al. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.
[148] John D. McCalpin,et al. Characterization of simultaneous multithreading (SMT) efficiency in POWER5 , 2005, IBM J. Res. Dev..
[149] Glenn Reinman,et al. A Comparative Survey of Load Speculation Architectures , 2000, J. Instr. Level Parallelism.
[150] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.
[151] Joel S. Emer,et al. Memory dependence prediction using store sets , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[152] P JouppiNorman. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990 .
[153] Alan Jay Smith,et al. Cache Memories , 1982, CSUR.
[154] D. Burger,et al. Efficient Synchronization: Let Them Eat QOLB /sup1/ , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[155] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[156] Michael C. Huang,et al. Dynamically Tuning Processor Resources with Adaptive Processing , 2003, Computer.
[157] Mark Horowitz,et al. An evaluation of directory schemes for cache coherence , 1998, ISCA '98.
[158] JosephDoug,et al. Prefetching using Markov predictors , 1997 .
[159] Todd M. Austin,et al. SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.
[160] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[161] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[162] Alec Wolman,et al. The structure and performance of interpreters , 1996, ASPLOS VII.
[163] Ramon Canal,et al. Dynamic cluster assignment mechanisms , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[164] Gurindar S. Sohi,et al. ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.
[165] Andrew R. Pleszkun,et al. Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.
[166] André Seznec,et al. A case for two-way skewed-associative caches , 1993, ISCA '93.
[167] Norman P. Jouppi,et al. Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.
[168] Yale N. Patt,et al. The effect of speculatively updating branch history on branch prediction accuracy, revisited , 1994, MICRO 27.
[169] Chris H. Perleberg,et al. Branch Target Buffer Design and Optimization , 1993, IEEE Trans. Computers.
[170] Balaram Sinharoy,et al. POWER4 system microarchitecture , 2002, IBM J. Res. Dev..
[171] J DallyWilliam. Virtual-channel flow control , 1990 .
[172] Daniel A. Jiménez,et al. The impact of delay on the design of branch predictors , 2000, MICRO 33.
[173] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[174] Burzin A. Patel,et al. Optimization of instruction fetch mechanisms for high issue rates , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[175] Reinhold Weicker,et al. Dhrystone: a synthetic systems programming benchmark , 1984, CACM.
[176] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.
[177] Z ChrysosGeorge,et al. Memory dependence prediction using store sets , 1998 .
[178] Arthur J. Bernstein,et al. Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..
[179] Dezsö Sima,et al. The Design Space of Register Renaming Techniques , 2000, IEEE Micro.
[180] Margaret Martonosi,et al. Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.
[181] AdveSarita,et al. Performance of image and video processing with general-purpose processors and media ISA extensions , 1999 .
[182] Wei-Fen Lin,et al. Designing a Modern Memory Hierarchy with Hardware Prefetching , 2001, IEEE Trans. Computers.
[183] Philip Levis,et al. Policies for dynamic clock scheduling , 2000, OSDI.
[184] Alan Jay Smith,et al. A class of compatible cache consistency protocols and their support by the IEEE futurebus , 1986, ISCA '86.
[185] Allan Porterfield,et al. The Tera computer system , 1990 .
[186] Stéphan Jourdan,et al. Speculation techniques for improving load related instruction scheduling , 1999, ISCA.
[187] Balaram Sinharoy,et al. IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.
[188] Brad Calder,et al. Discovering and Exploiting Program Phases , 2003, IEEE Micro.
[189] Gurindar S. Sohi,et al. Instruction Issue Logic for High-Performance Interruptible, Multiple Functional Unit, Pipelines Computers , 1990, IEEE Trans. Computers.
[190] David R. Kaeli,et al. Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance , 1999, IEEE Trans. Computers.
[191] Peter Petrov,et al. Transforming binary code for low-power embedded processors , 2004, IEEE Micro.
[192] Sumedh W. Sathaye,et al. A technique for object code compatibility in VLIW architectures , 1995, MICRO 1995.
[193] Daniel H. Friendly,et al. Evaluation of Design Options for the Trace Cache Fetch Mechanism , 1999, IEEE Trans. Computers.
[194] Carlo H. Séquin,et al. RISC I: a reduced instruction set VLSI computer , 1981, ISCA '98.
[195] B. Ramakrishna Rau,et al. EPIC: Explicititly Parallel Instruction Computing , 2000, Computer.
[196] Uri C. Weiser,et al. MMX technology extension to the Intel architecture , 1996, IEEE Micro.
[197] Gary S. Tyson,et al. Performance Limits of Trace Caches , 1999, J. Instr. Level Parallelism.
[198] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.
[199] Martin Hopkins,et al. Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.
[200] James R. Goodman,et al. Efficient Synchronization: Let Them Eat QOLB /sup1/ , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[201] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[202] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[203] Susan J. Eggers,et al. Balanced scheduling: instruction scheduling when memory latency is uncertain , 1993, PLDI '93.
[204] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.