Coming challenges in microarchitecture and architecture
暂无分享,去创建一个
Avi Mendelson | Shih-Lien Lu | Konrad K. Lai | Ronny Ronen | F. Pollack | J. P. Shen | Shih-Lien Lu | R. Ronen | A. Mendelson | K. Lai | J. Shen | F. Pollack
[1] Stéphan Jourdan,et al. Speculation techniques for improving load related instruction scheduling , 1999, ISCA.
[2] Keith Diefendorff,et al. Power4 focuses on memory bandwidth , 1999 .
[3] Yale N. Patt,et al. One Billion Transistors, One Uniprocessor, One Chip , 1997, Computer.
[4] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[5] Stéphan Jourdan,et al. A novel renaming scheme to exploit value temporal locality through physical register reuse and unification , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[6] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[7] S. McFarling. Combining Branch Predictors , 1993 .
[8] Leon Lantz,et al. Soft errors induced by alpha particles , 1996, IEEE Trans. Reliab..
[9] B. Ramakrishna Rau,et al. EPIC: Explicititly Parallel Instruction Computing , 2000, Computer.
[10] Avtar Saini,et al. Design of the Intel Pentium processor , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.
[11] Huy Nguyen,et al. AltiVec/sup TM/: bringing vector technology to the PowerPC/sup TM/ processor family , 1999, 1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305).
[12] L. Geppert,et al. Transmeta's magic show [microprocessor chips] , 2000 .
[13] P. Bai,et al. A high performance 180 nm generation logic technology , 1998, International Electron Devices Meeting 1998. Technical Digest (Cat. No.98CH36217).
[14] Alan Jay Smith,et al. Cache Memories , 1982, CSUR.
[15] Stamatis Vassiliadis,et al. Architectural effects on dual instruction issue with interlock collapsing ALUs , 1993, Proceedings of Phoenix Conference on Computers and Communications.
[16] Augustus K. Uht,et al. Disjoint eager execution: an optimal form of speculative execution , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[17] Yoav Almog,et al. eXtended block cache , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[18] Dean M. Tullsen,et al. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.
[19] Stéphan Jourdan,et al. Early load address resolution via register tracking , 2000, ISCA '00.
[20] Trevor Mudge,et al. The role of adaptivity in two-level adaptive branch prediction , 1995, MICRO 1995.
[21] Andrew F. Glew. MLP yes! ILP no , 1998, ASPLOS 1998.
[22] David B. Papworth. Tuning the Pentium Pro microarchitecture , 1996, IEEE Micro.
[23] Andreas Moshovos,et al. Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[24] Norman P. Jouppi,et al. The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[25] Shih-Lien Lu,et al. Non-stalling counterflow architecture , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[26] R. Allmon,et al. High-performance microprocessor design , 1998, IEEE J. Solid State Circuits.
[27] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[28] Alexandre E. Eichenberger,et al. Stage scheduling: a technique to reduce the register requirements of a module schedule , 1995, MICRO 1995.
[29] Steven A. Przybylski,et al. Cache and memory hierarchy design: a performance-directed approach , 1990 .
[30] Eric Rotenberg,et al. AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[31] M. Bohr. Interconnect scaling-the real limiter to high performance ULSI , 1995, Proceedings of International Electron Devices Meeting.
[32] Bruce D. Shriver,et al. The anatomy of a high-performance microprocessor - a systems perspective , 1998 .
[33] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.
[34] James E. Smith,et al. The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[35] Avi Mendelson,et al. Performance and hardware complexity tradeoffs in designing multithreaded architectures , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[36] John Paul Shen,et al. The block-based trace cache , 1999, ISCA.
[37] J. Maiz,et al. Alpha-SER modeling and simulation for sub-0.25 /spl mu/m CMOS technology , 1999, 1999 Symposium on VLSI Technology. Digest of Technical Papers (IEEE Cat. No.99CH36325).
[38] G.S. Sohi,et al. Dynamic instruction reuse , 1997, ISCA '97.
[39] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[40] Norman P. Jouppi,et al. Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.
[41] Yale N. Patt,et al. Putting the fill unit to work: dynamic optimizations for trace cache microprocessors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[42] M.J. Flynn,et al. Deep submicron microprocessor design issues , 1999, IEEE Micro.
[43] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.
[44] Fred Weber,et al. AMD 3DNow! technology: architecture and implementations , 1999, IEEE Micro.
[45] Avi Mendelson,et al. Using value prediction to increase the power of speculative execution hardware , 1998, TOCS.
[46] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[47] R. Senthinathan,et al. A 600 MHz IA-32 microprocessor with enhanced data streaming for graphics and video , 1999, 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278).
[48] Todd M. Austin,et al. DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[49] S. Richardson. Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation , 1992 .
[50] Doug Matzke,et al. Will Physical Scalability Sabotage Performance Gains? , 1997, Computer.
[51] Donald S. Fussell,et al. Energy-efficient instruction set architecture for CMOS microprocessors , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.
[52] Pius Ng,et al. A comparision of superscalar and decoupled access/execute architectures , 1993, MICRO 1993.
[53] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[54] Thomas D. Burd,et al. The simulation and evaluation of dynamic voltage scaling algorithms , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).
[55] Yale N. Patt,et al. An effective programmable prefetch engine for on-chip caches , 1995, MICRO 1995.
[56] Eric Rotenberg,et al. Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[57] J. E. Thornton,et al. Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).
[58] Shekhar Y. Borkar,et al. Design challenges of technology scaling , 1999, IEEE Micro.
[59] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[60] Kewal K. Saluja,et al. A study of time-redundant fault tolerance techniques for high-performance pipelined computers , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[61] Yale N. Patt,et al. Alternative implementations of hybrid branch predictors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[62] Yale N. Patt,et al. Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.
[63] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[64] Uri C. Weiser,et al. Correlated load-address predictors , 1999, ISCA.
[65] Joseph A. Fisher,et al. Very Long Instruction Word architectures and the ELI-512 , 1983, ISCA '83.