Trends in High-Performance Computer Architecture

Technological advances have led to microprocessors running at 300 MHz, to memory capacities of hundreds of megabytes per processing element, and to parallel processors with a few hundred nodes. In this paper, we review advances in processor design, techniques to reduce memory latency, and means to interconnect powerful nodes. In the case of processors, we emphasize the implementation of techniques such as speculative execution based on branch prediction and out-of-order execution. We describe alternate ways to handle vector data. While memory capacity is increasing, so has, relatively speaking, its latency. We describe how add-ons to caches and cache hierarchies can help reduce the memory latency. In the case of shared-memory multiprocessors, we show how relaxing sequentiality constraints is one way to reduce latency. In the case of interconnects, there is no definite trend for topology at this time. It appears that the real problem is latency, not bandwidth.