Analysis of Avalanche s Shared Memory Architecture

In this paper we describe the design of the Avalanchemultiprocessor s shared memory subsys tem evaluate its performance and discuss problems associated with using commodity worksta tions and network interconnects as the building blocks of a scalable shared memorymultiprocessor Compared to other scalable shared memory architectures Avalanchehas a number of novel fea tures including its support for the Simple COMA memory architecture and its support for multiple coherency protocols migratory delayed write update and soon write invalidate We describe the performance implications of Avalanche s architecture the impact of various novel low level design options and describe a number of interesting phenomena we encountered while developing a scalable multiprocessor built on the HP PA RISC platform Analysis of Avalanche s Shared Memory Architecture Ravindra Kuramkote John Carter Alan Davis Chen Chi Kuo Leigh Stoller Mark Swanson Computer Systems Laboratory University of Utah

[1]  Leigh Stoller,et al.  Paint: pa instruction set interpreter , 1996 .

[2]  Mats Brorsson,et al.  An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.

[3]  Leigh Stoller,et al.  Direct deposit: A basic user-level protocol for carpet clusters , 1995 .

[4]  John B. Carter,et al.  An argument for simple COMA , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[5]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[6]  Willy Zwaenepoel,et al.  Adaptive software cache management for distributed shared memory architectures , 1990, ISCA '90.

[7]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[8]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[9]  Mark R. Swanson,et al.  Shared Memory as a Basis forConservative Distributed Architectural Simulation , 1997 .

[10]  Willy Zwaenepoel,et al.  Techniques for reducing consistency-related communication in distributed shared-memory systems , 1995, TOCS.

[11]  Michel Dubois,et al.  Delayed consistency and its effects on the miss rate of parallel programs , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[12]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[13]  Donald Yeung,et al.  THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[14]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[15]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[16]  Robert J. Fowler,et al.  Adaptive cache coherency for detecting migratory shared data , 1993, ISCA '93.

[17]  Anant Agarwal,et al.  Software-extended coherent shared memory: performance and cost , 1994, ISCA '94.

[18]  Anoop Gupta,et al.  The performance impact of flexibility in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.

[19]  Thomas R. Hotchkiss,et al.  A New Memory System Design for Commercial and Technical Computing Products , 1996 .

[20]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[21]  Mike Hibler,et al.  Evaluating the Potential of Programmable Multiprocessor Cache Controllers , 1994 .

[22]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[23]  William R. Bryg,et al.  A High-Performance, Low-Cost Multiprocessor Bus for Workstations and Midrange Servers , 1996 .

[24]  Michael C. Browne,et al.  The S3.mp scalable shared memory multiprocessor , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[25]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[26]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.