Multithreaded Instruction Sharing
暂无分享,去创建一个
[1] M TullsenDean,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .
[2] Paul D. Franzon,et al. FreePDK: An Open-Source Variation-Aware Design Kit , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).
[3] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[4] Gurindar S. Sohi,et al. An empirical analysis of instruction repetition , 1998, ASPLOS VIII.
[5] Antonio González,et al. Trace-level reuse , 1999, Proceedings of the 1999 International Conference on Parallel Processing.
[6] Norman P. Jouppi,et al. Conjoined-Core Chip Multiprocessing , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[7] Amir Roth,et al. Three extensions to register integration , 2002, MICRO 35.
[8] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[9] Henry P. Moreton,et al. The GeForce 6800 , 2005, IEEE Micro.
[10] Andreas Moshovos,et al. Speculative Memory Cloaking and Bypassing , 1999, International Journal of Parallel Programming.
[11] Gurindar S. Sohi,et al. Register integration: a simple and efficient implementation of squash reuse , 2000, MICRO 33.
[12] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[13] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[14] Thierry Gautier,et al. KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors , 2007, PASCO '07.
[15] Steven K. Reinhardt,et al. The impact of resource partitioning on SMT processors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[16] Frederic T. Chong,et al. Multi-execution: multicore caching for data-similar executions , 2009, ISCA '09.
[17] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[18] Matthew Curtis-Maury,et al. Integrating multiple forms of multithreaded execution on multi-SMT systems: a study with scientific applications , 2005, Second International Conference on the Quantitative Evaluation of Systems (QEST'05).
[19] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Greg Grohoski. Niagara-2: A highly threaded server-on-a-chip , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).
[21] Antonio González,et al. Dynamic removal of redundant computations , 1999, ICS '99.
[22] Yao Zhang,et al. Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations , 2009, Euro-Par Workshops.
[23] G.S. Sohi,et al. Dynamic instruction reuse , 1997, ISCA '97.
[24] Guy E. Blelloch,et al. Scheduling threads for constructive cache sharing on CMPs , 2007, SPAA '07.
[25] Stéphan Jourdan,et al. A novel renaming scheme to exploit value temporal locality through physical register reuse and unification , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[26] Israel Koren,et al. An Adaptive Resource Partitioning Algorithm for SMT processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[27] Larry Rudolph,et al. Accelerating multi-media processing by implementing memoing in multiplication and division units , 1998, ASPLOS VIII.