Design and implementation of a multipurpose cluster system network interface unit
暂无分享,去创建一个
[1] Eric A. Brewer,et al. Remote queues: exposing message queues for optimization and atomicity , 1995, SPAA '95.
[2] Vivek Sarkar,et al. Location Consistency: Stepping Beyond the Memory Coherence Barrier , 1995, ICPP.
[3] Erik Hagersten,et al. Simple COMA node implementations , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.
[4] Michael S. Ehrlich,et al. StarT-jr : a parallel system from commodity technology , 1997 .
[5] Mark D. Hill,et al. The impact of data transfer and buffering alternatives on network interface design , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[6] G. A. Boughton,et al. Arctic routing chip , 1994, Symposium Record Hot Interconnects II.
[7] Milon Mackey,et al. An implementation of the Hamlyn sender-managed interface architecture , 1996, OSDI '96.
[8] William J. Dally,et al. The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.
[9] Cathy May,et al. The PowerPC Architecture: A Specification for a New Family of RISC Processors , 1994 .
[10] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[11] John B. Carter,et al. An argument for simple COMA , 1995, Future Gener. Comput. Syst..
[12] Jack Dongarra,et al. PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .
[13] Charles E. Leiserson,et al. Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.
[14] Michael L. Scott,et al. Software cache coherence for large scale multiprocessors , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[15] Michel Dubois,et al. Formal verification of delayed consistency protocols , 1996, Proceedings of International Conference on Parallel Processing.
[16] Thorsten von Eicken,et al. Evolution of the Virtual Interface Architecture , 1998, Computer.
[17] P. Pierce,et al. The NX/2 operating system , 1988, C3P.
[18] David L. Dill,et al. The Murphi Verification System , 1996, CAV.
[19] Donald Yeung,et al. Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.
[20] James K. Archibald,et al. Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.
[21] Scott Pakin,et al. Fast messages: efficient, portable communication for workstation clusters and MPPs , 1997, IEEE Concurrency.
[22] David L. Dill,et al. Verification of Cache Coherence Protocols by Aggregation of Distributed Transactions , 1998, Theory of Computing Systems.
[23] R. E. Kessler,et al. Cray T3D: a new dimension for Cray Research , 1993, Digest of Papers. Compcon Spring.
[24] Richard B. Gillett. Memory Channel Network for PCI , 1996, IEEE Micro.
[25] James R. Larus,et al. Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.
[26] James R. Goodman,et al. Efficient Synchronization: Let Them Eat QOLB , 1997, International Symposium on Computer Architecture.
[27] Somesh Jha,et al. Verification of the Futurebus+ cache coherence protocol , 1993, Formal Methods Syst. Des..
[28] Jack J. Dongarra,et al. A message passing standard for MPP and workstations , 1996, CACM.
[29] Anoop Gupta,et al. The DASH prototype: implementation and performance , 1992, ISCA '92.
[30] Jean-Loup Baer,et al. Two techniques for improving performance on bus-based multiprocessors , 1995, Future Gener. Comput. Syst..
[31] Michel Dubois,et al. Combined performance gains of simple cache protocol extensions , 1994, ISCA '94.
[32] Michael L. Scott,et al. Synchronization without contention , 1991, ASPLOS IV.
[33] Tom Lovett,et al. STiNG: A CC-NUMA Computer System for the Commercial Marketplace , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[34] James C. Hoe,et al. START-NG: Delivering Seamless Parallel Computing , 1995, Euro-Par.
[35] Rishiyur S. Nikhil,et al. Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines , 1994, LCPC.
[36] D.A. Wood,et al. Reactive NUMA: A Design For Unifying S-COMA And CC-NUMA , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[37] M. J. Beckerle,et al. T: integrated building blocks for parallel computing , 1993, Supercomputing '93.
[38] Pat Helland,et al. The Mercury Interconnect Architecture: A Cost-effective Infrastructure For High-performance Servers , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[39] Arvind,et al. T: A Multithreaded Massively Parallel Architecture , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[40] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[41] Victor Luchangco,et al. Computation-centric memory models , 1998, SPAA '98.
[42] Victor Lee,et al. Exploiting two-case delivery for fast protected messaging , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[43] John Kubiatowicz,et al. Integrated shared-memory and message-passing communication in the Alewife multiprocessor , 1998 .
[44] William J. Dally,et al. Architecture and implementation of the reliable router , 1994, Symposium Record Hot Interconnects II.
[45] Michael Alexander,et al. Designing the PowerPC 60X bus , 1994, IEEE Micro.
[46] J. Larus,et al. Tempest and Typhoon: user-level shared memory , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[47] Anoop Gupta,et al. Optimized multiprocessor communication and synchronization using a programmable protocol engine , 1998 .
[48] Mary K. Vernon,et al. Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS III.
[49] James C. Hoe. StarT-X - A One-Man-Year Exercise in Network Interface Engineering , 1998 .
[50] Seth Copen Goldstein,et al. Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[51] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[52] David Chaiken,et al. The Alewife CMMU: Addressing the Multiprocessor Communications Gap , 1994 .
[53] Alan L. Cox,et al. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.
[54] Brian N. Bershad,et al. The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.
[55] Kourosh Gharachorloo,et al. Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.
[56] Josep Torrellas,et al. The Augmint multiprocessor simulation toolkit for Intel x86 architectures , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.
[57] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.
[58] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[59] Ronald Minnich,et al. The memory integrated network interface , 1994, Symposium Record Hot Interconnects II.
[60] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[61] Larry Rudolph,et al. CACHET: an adaptive cache coherence protocol for distributed shared-memory systems , 1999, ICS '99.
[62] Paul Hudak,et al. Memory coherence in shared virtual memory systems , 1986, PODC '86.
[63] Larry Rudolph,et al. Commit-reconcile & fences (CRF): a new memory model for architects and compiler writers , 1999, ISCA.
[64] James C. Hoe. Effective parallel computation on workstation cluster with a user-level communication network , 1994 .
[65] Pong Fong. Symbolic state model: a new approach for the verification of cache coherence protocols , 1996 .
[66] Document for a Standard Message-Passing Interface , 1993 .
[67] Kenneth M. Mackenzie,et al. An efficient virtual network interface in the FUGU scalable workstation dc by Kenneth Martin Mackenzie , 1998 .
[68] John L. Hennessy,et al. The FLASH Multiprocessor: Designing a Flexible and Scalable System , 1998 .
[69] Kenneth L. McMillan,et al. Symbolic model checking: an approach to the state explosion problem , 1992 .
[70] Kirk L. Johnson,et al. CRL: high-performance all-software distributed shared memory , 1995, SOSP.
[71] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[72] Anoop Gupta,et al. The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.
[73] David A. Wood,et al. Decoupled Hardware Support for Distributed Shared Memory , 1996, ISCA.
[74] Jon Beecroft,et al. Meiko CS-2 Interconnect Elan-Elite Design , 1994, Parallel Comput..
[75] Tilak Agerwala,et al. SP2 System Architecture , 1999, IBM Syst. J..
[76] David L. Dill,et al. Verification of FLASH cache coherence protocol by aggregation of distributed transactions , 1996, SPAA '96.
[77] Anna R. Karlin,et al. Competitive snoopy caching , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).
[78] Mike Galles. Spider: a high-speed network interconnect , 1997, IEEE Micro.
[79] G. Andrew Boughton. Arctic Switch Fabric , 1997, PCRCW.
[80] Babak Falsafi,et al. Coherent Network Interfaces for Fine-Grain Communication , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[81] Scott B. Marovich,et al. Hamlyn: a high-performance network interface with sender-based memory management , 1995 .
[82] Allan Porterfield,et al. The Tera computer system , 1990 .
[83] Greg J. Regnier,et al. The Virtual Interface Architecture , 2002, IEEE Micro.
[84] Anant Agarwal,et al. FUGU: Implementing Translation and Protection in a Multiuser, Multimodel Multiprocessor , 1994 .
[85] Liviu Iftode,et al. Scope Consistency: A Bridge between Release Consistency and Entry Consistency , 1996, SPAA '96.
[86] Michael C. Browne,et al. Exploiting Parallelism in Cache Coherency Protocol Engines , 1995, Euro-Par.
[87] A. Gupta,et al. The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.