Low-complexity vector microprocessor extension
暂无分享,去创建一个
[1] George L.-T. Chiu,et al. Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..
[2] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[3] John K. Ousterhout. Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.
[4] Larry Rudolph,et al. Gang Scheduling Performance Benefits for Fine-Grain Synchronization , 1992, J. Parallel Distributed Comput..
[5] Tzi-cker Chiueh. Multi-threaded vectorization , 1991, ISCA '91.
[6] P.H. Worley,et al. Early Evaluation of the Cray X1 , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[7] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[8] Ahmed Sameh,et al. The Illiac IV system , 1972 .
[9] Mateo Valero,et al. Adding a vector unit to a superscalar processor , 1999, ICS '99.
[10] Ole Agesen,et al. A comparison of software and hardware techniques for x86 virtualization , 2006, ASPLOS XII.
[11] James E. Smith,et al. Decoupled access/execute computer architectures , 1984, TOCS.
[12] Chris R. Jesshope. Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines , 2001 .
[13] Jeffrey S. Vetter. Cray X1 Evaluation Status Report , 2004 .
[14] Guy E. Blelloch,et al. AD-A 270 601 Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors , 1993 .
[15] Carl Staelin,et al. lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.
[16] James L. Peterson,et al. Design and validation of a performance and power simulator for PowerPC systems , 2003, IBM J. Res. Dev..
[17] Balaram Sinharoy,et al. POWER4 system microarchitecture , 2002, IBM J. Res. Dev..
[18] P. Colella,et al. Local adaptive mesh refinement for shock hydrodynamics , 1989 .
[19] Yu Zhang,et al. Parallelization of IBM mambo system simulator in functional modes , 2008, OPSR.
[20] Karthick Rajamani,et al. Application of full-system simulation in exploratory system design and development , 2006, IBM J. Res. Dev..
[21] T. Skotnicki,et al. The end of CMOS scaling: toward the introduction of new materials and structural changes to improve MOSFET performance , 2005, IEEE Circuits and Devices Magazine.
[22] Ronald G. Dreslinski,et al. Analysis of hardware prefetching across virtual page boundaries , 2007, CF '07.
[23] Dean M. Tullsen,et al. Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.
[24] Scott Devine,et al. Using the SimOS machine simulator to study complex computer systems , 1997, TOMC.
[25] Michael J. Flynn,et al. Intrinsic multiprocessing , 1967, AFIPS '67 (Spring).
[26] Rajeev Balasubramonian,et al. Reducing the complexity of the register file in dynamic superscalar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[27] Emil Talpes,et al. Execution cache-based microarchitecture for power-efficient superscalar processors , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[28] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[29] Larry L. Biro,et al. Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).
[30] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[31] Zhen Fang,et al. The Impulse Memory Controller , 2001, IEEE Trans. Computers.
[32] R. Kumar,et al. An Integrated Quad-Core Opteron Processor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[33] Kentaro Shimada,et al. A superscalar RISC processor with 160 FPRs for large scale scientific processing , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).
[34] W. Wasow,et al. Finite-Difference Methods for Partial Differential Equations , 1961 .
[35] Michael J. Flynn,et al. Very high-speed computing systems , 1966 .
[36] John Wawrzynek,et al. T0: A Single-Chip Vector Microprocessor with Reconfigurable Pipelines , 1996, ESSCIRC '96: Proceedings of the 22nd European Solid-State Circuits Conference.
[37] Werner Buchholz. The IBM System/370 Vector Architecture , 1986, IBM Syst. J..
[38] Josep Torrellas,et al. A Brief Description of the NMP ISA and Benchmarks , 2005 .
[39] Greg Grohoski. Niagara-2: A highly threaded server-on-a-chip , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).
[40] Katherine A. Yelick,et al. Evaluating support for global address space languages on the Cray X1 , 2004, ICS '04.
[41] Josep Torrellas,et al. A Near-Memory Processor for Vector, Streaming and Bit Manipulation Workloads , 2005 .
[42] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[43] David A. Patterson,et al. Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[44] Lixin Zhang,et al. Mambo: a full system simulator for the PowerPC architecture , 2004, PERV.
[45] Mateo Valero,et al. Decoupled vector architectures , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[46] Edmund L. Wong,et al. Polymorphous Computing Architecture (PCA) Kernel-Level Benchmarks , 2005 .
[47] C. Lemuet,et al. The Potential Energy Efficiency of Vector Acceleration , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[48] P. Brandimarte. Finite Difference Methods for Partial Differential Equations , 2006 .
[49] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[50] Steve Pawlowski. Petascale Computing Research Challenges - A Manycore Perspective , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[51] Mahmut T. Kandemir,et al. Hardware and Software Techniques for Controlling DRAM Power Modes , 2001, IEEE Trans. Computers.
[52] Sadaf R. Alam,et al. Early evaluation of the Cray XT3 , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[53] Eric Rotenberg,et al. A large, fast instruction window for tolerating cache misses , 2002, ISCA.
[54] Larry Rudolph,et al. Distributed hierarchical control for parallel processing , 1990, Computer.
[55] Dirk Grunwald,et al. Pipeline gating: speculation control for energy reduction , 1998, ISCA.
[56] David A. Patterson,et al. Scalable Vector Media-processors for Embedded Systems , 2002 .
[57] Mateo Valero,et al. Out-of-order vector architectures , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[58] Brad Calder,et al. Predictor-directed stream buffers , 2000, MICRO 33.
[59] Gary S. Tyson,et al. On high-bandwidth data cache design for multi-issue processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[60] Samuel Williams,et al. Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture , 2009, ARCS.
[61] Brian B. Moore,et al. The IBM System/370 Vector Architecture: Design Considerations , 1988, IEEE Trans. Computers.
[62] J. Little. A Proof for the Queuing Formula: L = λW , 1961 .
[63] Mateo Valero,et al. A performance study of out-of-order vector architectures and short registers , 1998, ICS '98.
[64] James E. Smith,et al. Vector instruction set support for conditional operations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[65] David A. Patterson,et al. Latency lags bandwith , 2004, CACM.
[66] Dileep Bhandarkar,et al. VAX vector architecture , 1990, ISCA '90.
[67] Corinna G. Lee,et al. Initial results on the performance and cost of vector microprocessors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[68] Sally A. McKee,et al. Reflections on the memory wall , 2004, CF '04.
[69] Vladimir M. Pentkovski,et al. Implementing Streaming SIMD Extensions on the Pentium III Processor , 2000, IEEE Micro.
[70] Eric M. Schwarz,et al. IBM POWER6 microarchitecture , 2007, IBM J. Res. Dev..
[71] Stephen Phillips. VictoriaFalls: Scaling highly-threaded processor cores , 2007, 2007 IEEE Hot Chips 19 Symposium (HCS).
[72] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[73] John Wawrzynek,et al. Vector microprocessors , 1998 .
[74] Wilson C. Hsieh,et al. Impulse: Memory system support for scientific applications , 1999, Sci. Program..
[75] Mateo Valero,et al. Vector architectures: past, present and future , 1998, ICS '98.
[76] D C LittleJohn. A Proof for the Queuing Formula , 1961 .
[77] Christopher Batten,et al. The Vector-Thread Architecture , 2004, ISCA 2004.
[78] Christoforos Kozyrakis,et al. A Media-Enhanced Vector Architecture for Embedded Memory Systems , 1999 .
[79] Matthew Mattina,et al. Tarantula: a vector extension to the alpha architecture , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[80] Avi Mendelson,et al. Micro-operation cache: a power aware frontend for variable instruction length ISA , 2003, IEEE Trans. Very Large Scale Integr. Syst..
[81] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[82] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[83] David A. Patterson,et al. Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .
[84] Uri C. Weiser,et al. MMX technology extension to the Intel architecture , 1996, IEEE Micro.
[85] Yuen H. Chan,et al. IBM POWER6 SRAM arrays , 2007, IBM J. Res. Dev..
[86] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[87] Norman P. Jouppi,et al. Fast synchronization for chip multiprocessors , 2005, CARN.
[88] Hiroshi Nakamura,et al. Evaluation of pseudo vector processor based on slide-windowed registers , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.
[89] I. Duff,et al. Direct Methods for Sparse Matrices , 1987 .
[90] Mateo Valero,et al. Multithreaded vector architectures , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.