Summary of multi-core hardware and programming model investigations
暂无分享,去创建一个
[1] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[2] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[3] Jung Ho Ahn,et al. Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[4] W. Daniel Hillis,et al. Data parallel algorithms , 1986, CACM.
[5] Brett H. Meyer,et al. Amdahl’s Law Revisited for Single Chip Systems , 2007, International Journal of Parallel Programming.
[6] Balaram Sinharoy,et al. IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.
[7] Jason Duell,et al. Productivity and performance using partitioned global address space languages , 2007, PASCO '07.
[8] Suzanne M. Kelly,et al. Software Architecture of the Light Weight Kernel, Catamount , 2005 .
[9] Keith D. Underwood,et al. Analyzing the Scalability of Graph Algorithms on Eldorado , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[10] Guang R. Gao,et al. Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture , 2006, 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06).
[11] Edward A. Lee. The problem with threads , 2006, Computer.
[12] Patrick Crowley,et al. Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.
[13] Maurice Herlihy,et al. Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[14] Guang R. Gao,et al. Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures , 2007, ISCA '07.
[15] Keith D. Underwood,et al. Implementation and Performance of Portals 3.3 on the Cray XT3 , 2005, 2005 IEEE International Conference on Cluster Computing.
[16] Seetharami R. Seelam,et al. Modeling the Impact of Checkpoints on Next-Generation Systems , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).
[17] A. Kumar,et al. Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip , 2008, IEEE Journal of Solid-State Circuits.
[18] Larry Rudolph,et al. Efficient synchronization of multiprocessors with shared memory , 1988, TOPL.
[19] Samuel Williams,et al. The potential of the cell processor for scientific computing , 2005, CF '06.
[20] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[21] Norman P. Jouppi,et al. Fast synchronization for chip multiprocessors , 2005, CARN.
[22] Jialin Ju,et al. ARMCI: A Portable Aggregate Remote Memory Copy Interface , 2000 .
[23] Katherine Yelick,et al. Introduction to UPC and Language Specification , 2000 .
[24] Henry Hoffmann,et al. Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[25] Mateo Valero,et al. Proceedings of the 2nd conference on Computing frontiers , 2005, CF 2008.
[26] Robert J. Harrison,et al. Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.
[27] R. Kumar,et al. An Integrated Quad-Core Opteron Processor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[28] William J. Dally,et al. Stream Processors: Progammability and Efficiency , 2004, ACM Queue.
[29] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[30] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[31] Pat Conway,et al. The AMD Opteron Processor for Multiprocessor Servers , 2003, IEEE Micro.
[32] Eric M. Schwarz,et al. IBM POWER6 microarchitecture , 2007, IBM J. Res. Dev..
[33] Tarek El-Ghazawi,et al. DEVELOPING AN OPTIMIZED UPC COMPILER FOR FUTURE ARCHITECTURES , 2005 .