Integrated shared-memory and message-passing communication in the Alewife multiprocessor
暂无分享,去创建一个
[1] Guy L. Steele,et al. Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines , 1990, J. Parallel Distributed Comput..
[2] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[3] David Chaiken,et al. Latency Tolerance through Multithreading in Large-Scale Multiprocessors , 1991 .
[4] Henry M. Levy,et al. Efficient Support for Multicomputing on ATM Networks , 1993 .
[5] A. Gupta,et al. The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[6] Seth Copen Goldstein,et al. Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.
[7] Ricardo Bianchini,et al. The MIT Alewife machine: architecture and performance , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[8] Anant Agarwal,et al. Integrating message-passing and shared-memory: early experience , 1993, SIGP.
[9] Anoop Gupta,et al. The DASH Prototype: Logic Overhead and Performance , 1993, IEEE Trans. Parallel Distributed Syst..
[10] Henry M. Levy,et al. A comparison of message passing and shared memory architectures for data parallel programs , 1994, ISCA '94.
[11] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[12] David Chaiken,et al. CACHE COHERENCE PROTOCOLS FOR LARGE-SCALE MULTIPROCESSORS , 1990 .
[13] Thorsten von Eicken,et al. U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.
[14] Donald Yeung,et al. Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.
[15] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[16] Timothy Mark Pinkston,et al. On Deadlocks in Interconnection Networks , 1997, ISCA.
[17] Wilson C. Hsieh,et al. Optimistic active messages: a mechanism for scheduling communication with computation , 1995, PPOPP '95.
[18] Andrew A. Chien,et al. The J-Machine: A Fine Grain Concurrent Computer , 1989 .
[19] Michel Dubois,et al. Synchronization, coherence, and event ordering in multiprocessors , 1988, Computer.
[20] James R. Larus,et al. Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.
[21] Douglas W. Clark. Large-Scale Hardware Simulation: Modeling and Veri cation Strategies , 1999 .
[22] Andrew A. Chien,et al. Architecture of a message-driven processor , 1987, ISCA '87.
[23] Mark D. Hill,et al. Weak ordering—a new definition , 1998, ISCA '98.
[24] Michael Gerndt,et al. SUPERB: A tool for semi-automatic MIMD/SIMD parallelization , 1988, Parallel Comput..
[25] Stefanos Kaxiras,et al. Kiloprocessor Extensions to SCI , 1996, Proceedings of International Conference on Parallel Processing.
[26] Marina C. Chen,et al. Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..
[27] Kirk L. Johnson,et al. High-performance all-software distributed shared memory , 1996 .
[28] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[29] Monica S. Lam,et al. Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.
[30] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.
[31] Anoop Gupta,et al. Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..
[32] A. A. Chein,et al. A cost and speed model for k-ary n-cube wormhole routers , 1998 .
[33] James R. Larus,et al. Application-specific protocols for user-level shared memory , 1994, Proceedings of Supercomputing '94.
[34] Burton J. Smith. Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.
[35] Kirk L. Johnson. The impact of communication locality on large-scale multiprocessor performance , 1992, ISCA '92.
[36] Ricardo Bianchini,et al. Limits on the performance benefits of multithreading and prefetching , 1996, SIGMETRICS '96.
[37] Stein Gjessing,et al. Distributed-directory scheme: scalable coherent interface , 1990, Computer.
[38] Eric A. Brewer,et al. Remote queues: exposing message queues for optimization and atomicity , 1995, SPAA '95.
[39] David Chaiken,et al. The Alewife CMMU: Addressing the Multiprocessor Communications Gap , 1994 .
[40] J. Larus,et al. Tempest and Typhoon: user-level shared memory , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[41] James R. Larus,et al. Where is time spent in message-passing and shared-memory programs? , 1994, ASPLOS VI.
[42] Anant Agarwal,et al. LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.
[43] James R. Larus,et al. Cooperative shared memory: software and hardware for scalable multiprocessor , 1992, ASPLOS V.
[44] Anne Rogers,et al. Process decomposition through locality of reference , 1989, PLDI '89.
[45] Seth Copen Goldstein,et al. Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[46] AgarwalAnant,et al. Directory-Based Cache Coherence in Large-Scale Multiprocessors , 1990 .
[47] Remzi H. Arpaci-Dusseau,et al. Empirical evaluation of the CRAY-T3D: a compiler perspective , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[48] James E. Smith,et al. The ZS-1 central processor , 1987, ASPLOS.
[49] Andrew A. Chien,et al. The J-Machine: A Fine-Gain Concurrent Computer , 1989, IFIP Congress.
[50] N. Madsen. Divergence preserving discrete surface integral methods for Maxwell's curl equations using non-orthogonal unstructured grids , 1995 .
[51] Daniel E. Lenoski,et al. Scalable Shared-Memory Multiprocessing , 1995 .
[52] Willy Zwaenepoel,et al. Adaptive software cache management for distributed shared memory architectures , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[53] Guang R. Gao,et al. Polling Watchdog: Combining Polling and Interrupts for Efficient Message Handling , 1996, International Symposium on Computer Architecture.
[54] David E. Culler,et al. Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[55] Charles L. Seitz,et al. The design of the Caltech Mosaic C multicomputer , 1993 .
[56] Robert H. B. Netzer,et al. Detecting data races on weak memory systems , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.
[57] Anant Agarwal,et al. Anatomy of a Message in the Alewife Multiprocessor , 1993, The 8th IEEE Workshop on Computer Communications.
[58] Donald Yeung,et al. Multigrain shared memory , 2000, TOCS.
[59] Erik Hagersten,et al. DDM - A Cache-Only Memory Architecture , 1992, Computer.
[60] John L. Hennessy,et al. The performance advantages of integrating block data transfer in cache-coherent multiprocessors , 1994, ASPLOS VI.
[61] M. J. Beckerle,et al. T: integrated building blocks for parallel computing , 1993, Supercomputing '93.
[62] Stefanos Kaxiras,et al. The GLOW cache coherence protocol extensions for widely shared data , 1996, ICS '96.
[63] Anoop Gupta,et al. Programming for Different Memory Consistency Models , 1992, J. Parallel Distributed Comput..
[64] Milon Mackey,et al. An implementation of the Hamlyn sender-managed interface architecture , 1996, OSDI '96.
[65] Anant Agarwal,et al. Directory-based cache coherence in large-scale multiprocessors , 1990, Computer.
[66] Victor Lee,et al. Exploiting two-case delivery for fast protected messaging , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[67] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[68] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[69] Beng-Hong Lim,et al. Reactive synchronization algorithms for multiprocessors , 1994, ASPLOS VI.
[70] A. Agarwal,et al. MGS: A Multigrain Shared Memory System , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[71] Marc Snir,et al. The Communication Software and Parallel Environment of the IBM SP2 , 1995, IBM Syst. J..
[72] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[73] Anoop Gupta,et al. Exploring The Benefits Of Multiple Hardware Contexts In A Multiprocessor Architecture: Preliminary Results , 1989, The 16th Annual International Symposium on Computer Architecture.
[74] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[75] Chris J. Scheiman,et al. Experience with active messages on the Meiko CS-2 , 1995, Proceedings of 9th International Parallel Processing Symposium.
[76] A. Gupta,et al. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.
[77] Anoop Gupta,et al. Integration of message passing and shared memory in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.
[78] William J. Dally. Virtual-channel flow control , 1990, ISCA '90.
[79] Dana S. Henry,et al. A tightly-coupled processor-network interface , 1992, ASPLOS V.
[80] Anant Agarwal,et al. Closing the window of vulnerability in multiphase memory transactions , 1992, ASPLOS V.
[81] David Chaiken,et al. Mechanisms and interfaces for software-extended coherent shared memory , 1994 .
[82] Shekhar Y. Borkar,et al. Supporting systolic and memory communication in iWarp , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[83] William J. Dally,et al. Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels , 1993, IEEE Trans. Parallel Distributed Syst..
[84] Rajeev Barua,et al. The sensitivity of communication mechanisms to bandwidth and latency , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[85] William J. Dally,et al. Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.
[86] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[87] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[88] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[89] Eric A. Brewer,et al. PROTEUS: a high-performance parallel-architecture simulator , 1992, SIGMETRICS '92/PERFORMANCE '92.
[90] Colin Whitby-Strevens. The transputer , 1985, ISCA 1985.
[91] Andrew A. Chien,et al. The Cost of Adaptivity and Virtual Lanes in aWormhole Router , 1995 .
[92] Kirk L. Johnson,et al. CRL: high-performance all-software distributed shared memory , 1995, SOSP.
[93] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[94] Anoop Gupta,et al. The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.
[95] Shekhar Y. Borkar,et al. iWarp: an integrated solution to high-speed parallel computing , 1988, Proceedings. SUPERCOMPUTING '88.
[96] Anant Agarwal,et al. FUGU: Implementing Translation and Protection in a Multiuser, Multimodel Multiprocessor , 1994 .
[97] Allan Porterfield,et al. Exploiting heterogeneous parallelism on a multithreaded multiprocessor , 1992, ICS '92.
[98] William J. Dally,et al. The J-machine Multicomputer: An Architectural Evaluation , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[99] Peter Druschel,et al. Experiences with a high-speed network adaptor: a software perspective , 1994, SIGCOMM 1994.
[100] Frederic T. Chong,et al. Parallel Communication Mechanisms for Sparse, Irregular Applications , 1997 .
[101] Babak Falsafi,et al. Coherent Network Interfaces for Fine-Grain Communication , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[102] William A. Wulf,et al. Evaluation of the WM Architecture , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[103] Richard A. Lethin,et al. Message-driven dynamics , 1997 .