Balanced Distributed Memory Parallel Computers

Mismatches between on-chip high performance CPU and data access times is the basic reason for the increasing gap between peak and sustained performance in distributed memory parallel computers. We propose the concept of balanced architectures, based on a network with a dynamic topology and communication patterns determined at compile time. The corresponding processing element is a cacheless CPU, which can achieve a 1 FLOP/clock cycle rate. Network and PE features are presented. An example shows that balanced architectures keep efficiency when scaling.