A first glance at Kilo-instruction based multiprocessors
暂无分享,去创建一个
Mateo Valero | Adrián Cristal | Valentin Puente | José-Ángel Gregorio | Ramón Beivide | Marco Galluzzi | M. Valero | J. Gregorio | R. Beivide | A. Cristal | M. Galluzzi | Valentin Puente
[1] Cruz Izu,et al. On the Design of a High-Performance Adaptive Router for CC-NUMA Multiprocessors , 2003, IEEE Trans. Parallel Distributed Syst..
[2] Jean-Loup Baer,et al. Cost-effective compiler directed memory prefetching and bypassing , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[3] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[4] T. N. Vijaykumar,et al. Is SC + ILP = RC? , 1999, ISCA.
[5] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[6] Carmen Carrión,et al. A flow control mechanism to avoid message deadlock in k-ary n-cube networks , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[7] M. Dubois,et al. Assisted Execution , 1998 .
[8] Josep Torrellas,et al. Using a user-level memory thread for correlation prefetching , 2002, ISCA.
[9] Mateo Valero,et al. Delaying physical register allocation through virtual-physical registers , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[10] John Paul Shen,et al. Dynamic speculative precomputation , 2001, MICRO.
[11] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[12] Cruz Izu,et al. The Adaptive Bubble Router , 2001, J. Parallel Distributed Comput..
[13] C.B. Stunkel,et al. A New Switch Chip for IBM RS/6000 SP Systems , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[14] Josep Llosa,et al. Out-of-order commit processors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[15] Craig Zilles,et al. Execution-based prediction using speculative slices , 2001, ISCA 2001.
[16] T. N. Vijaykumar,et al. Reducing Design Complexity of the Load/Store Queue , 2003, MICRO.
[17] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[18] Yale N. Patt,et al. Checkpoint repair for out-of-order execution machines , 1987, ISCA '87.
[19] Per Stenström,et al. A prefetching technique for irregular accesses to linked data structures , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[20] Alan Jay Smith,et al. Cache Memories , 1982, CSUR.
[21] Haitham Akkary,et al. Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors , 2003, MICRO.
[22] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[23] Sarita V. Adve,et al. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models , 1997, SPAA '97.
[24] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[25] Shubhendu S. Mukherjee,et al. The Alpha 21364 network architecture , 2001, HOT 9 Interconnects. Symposium on High Performance Interconnects.
[26] Sarita V. Adve,et al. RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors , 1997 .
[27] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[28] Henry M. Levy,et al. An Architecture for Software-Controlled Data Prefetching , 1991, ISCA.
[29] David F. Heidel,et al. An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[30] Josep Llosa,et al. Kilo-instruction Processors , 2003, ISHPC.
[31] Valentin Puente,et al. SICOSYS: an integrated framework for studying interconnection network performance in multiprocessor systems , 2002, Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing.
[32] Simha Sethumadhavan,et al. Scalable hardware memory disambiguation for high-ILP processors , 2003, IEEE Micro.
[33] Mike Galles. Spider: a high-speed network interconnect , 1997, IEEE Micro.
[34] Anoop Gupta,et al. Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.
[35] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[36] Gurindar S. Sohi,et al. Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[37] Josep Llosa,et al. Large virtual robs by processor checkpointing , 2002 .
[38] Josep Llosa,et al. A case for resource-conscious out-of-order processors , 2004, IEEE Computer Architecture Letters.
[39] Yale N. Patt,et al. Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.
[40] Anoop Gupta,et al. Hiding memory latency using dynamic scheduling in shared-memory multiprocessors , 1992, ISCA '92.