Algorithms and design: the CRAY APP shared-memory system

Analysis of fundamental algorithms of computational science drove the design of the CRAY APP system. The important characteristics central to many applications are exploited through the use of shared-memory programming techniques using existing compiler technology. A cluster-capable 84-processor system, the CRAY APP, provides a flat shared memory, low memory latency, fast barrier synchronization, and hardware-assisted parallel support. A patented crossbar/bus architecture provides system economy. Deterministic system behavior allows the compilers to view the system as a single virtual processor. For even higher performance, multiple CRAY APPs can be clustered. Cluster configurations may also contain a globally accessible memory. High-bandwidth low-latency connections allow this configuration to be effective for applications that require more performance than one CRAY APP.<<ETX>>