论文信息 - Trends in High-Performance Computer Architecture

Trends in High-Performance Computer Architecture

Technological advances have led to microprocessors running at 300 MHz, to memory capacities of hundreds of megabytes per processing element, and to parallel processors with a few hundred nodes. In this paper, we review advances in processor design, techniques to reduce memory latency, and means to interconnect powerful nodes. In the case of processors, we emphasize the implementation of techniques such as speculative execution based on branch prediction and out-of-order execution. We describe alternate ways to handle vector data. While memory capacity is increasing, so has, relatively speaking, its latency. We describe how add-ons to caches and cache hierarchies can help reduce the memory latency. In the case of shared-memory multiprocessors, we show how relaxing sequentiality constraints is one way to reduce latency. In the case of interconnects, there is no definite trend for topology at this time. It appears that the real problem is latency, not bandwidth.

Jean-Loup Baer | J. Baer

[1] Eric Williams,et al. Performance optimizations, implementation, and verification of the SGI Challenge multiprocessor , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[2] Shreekant S. Thakkar,et al. Synchronization algorithms for shared-memory multiprocessors , 1990, Computer.

[3] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[4] Anoop Gupta,et al. The DASH Prototype: Logic Overhead and Performance , 1993, IEEE Trans. Parallel Distributed Syst..

[5] Chris H. Perleberg,et al. Branch Target Buffer Design and Optimization , 1993, IEEE Trans. Computers.

[6] J. E. Thornton. Design of a Computer: The Control Data 6600 , 1970 .

[7] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[8] Norman P. Jouppi. Cache write policies and performance , 1993, ISCA '93.

[9] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[10] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[11] Donald H. Gibson,et al. Engineering and Scientific Processing on the IBM 3090 , 1986, IBM Syst. J..

[12] Yale N. Patt,et al. Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.