General-purpose microprocessors are the devices that have fueled the personal computer and internet revolution we have experienced over the past couple of decades. A processor is at the heart of every computer system in use today, from tiny autonomous embedded control systems to large scale, powerful, networked supercomputers. The Compaq Alpha microprocessor line fits into the highperformance end of this spectrum, powering high-end workstations, servers, and supercomputers. Over the years, processor architecture and design always had to respond to and move in lock-step with technological advances in processor implementation and integrated circuit technology, as well as programming paradigms and instructions sets. In this paper I review some of the trends that have driven the Compaq Alpha processor architecture in the past decade and give an outlook at current and future trends that will have an impact on the architecture of future Alpha processors. Introduction A processor is at the heart of every computer system that we build today. Around this processor, you find several other components that make up a computer. Memory for instruction and data storage and input-output devices to communicate with the rest of the world, like disk controllers, graphics cards, keyboard interfaces, network adapters, etc. The purpose of the processor is to execute machine instructions. Thus, the logical operation of a processor is defined by the instruction set architecture (ISA) that it executes. Multiple different processors can implement the same ISA. What differentiates such processors is their processor architecture, which is the way that each processor is organized internally in order to achieve the goal of implementing its ISA. By changing the processor architecture, a processor designer can influence the performance characteristics and efficiency with which instructions are executed. Processor architecture also has to respond to implementation constraints imposed on it by the target circuit technology of the chip, in order to achieve a set performance goal. In the rest of this section I will give a short crash course in advanced computer architecture an overview of the state of the art in processor architecture for general-purpose high-performance microprocessors. Early Architectures In early computer architectures, processor operation was very simple and strictly sequential. In the first step, for each instruction the program counter (PC) would be used to send the next instruction address to memory. Potentially several clock cycles later, the instruction is returned from memory. Then the instruction would be decoded. Decoding produces a list of source and destination operands that the instruction operates on, and a specific operation that is to be performed. In the next step, source operands would be accessed and delivered to the arithmetic-logic unit (ALU). The ALU eventually performs the operation that was specified in the instruction and delivers a result. The result is then written back to the destination that was decoded. Finally, the PC would be updated to advance to the next instruction that is to be executed, after which the whole process starts from the beginning for the next instruction. It is easy to see that in this type of design, many operations of the processor are unnecessarily serialized and large portions of the processor sit idle for a majority of time. For example, the ALU is only busy during the period where the operation is performed on the source operands, but sits idle during the rest of the time it takes to execute an instruction. It is not uncommon for an instruction to consume on the order of 10 clock cycles to execute. Processor architects often quote processor performance in instructions executed per clock cycle (IPC). Therefore, this simple processor architecture would achieve a performance of 0.1 IPC [4]. A driving force for this design were sparse resources. The number of transistors that was available on a processor chip was low (tens of thousands). Much emphasis of the design had to be placed on limiting the number of transistors to implement each function of the processor. Efficiency could only be addressed when functionality was satisfied, which left little freedom. Pipelining To achieve higher performance, the various operations involved in executing a single instruction can be separated into different stages of a pipeline and performed in parallel for multiple instructions. Since the pipeline can only advance at the rate of its slowest stage, it is advantageous for all instructions to have approximately the same amount of work to do in each pipeline stage. The architectural shift to a pipelined design goes hand in hand with a shift in predominant ISAs of the time from complex instruction sets (CISC) to simpler, reduced instruction sets (RISC). In RISC instruction sets, each instruction only performs a simple operation that can be executed in a short pipeline stage. All instructions have a similar amount of work to perform, supporting a shift to pipelined architectures. Artur Klauser TU Graz, Telematik, Dipl.-Ing. 1994 Univ. of Colorado at Boulder, Computer
[1]
N. P. Jouppi.
Architectural and organizational tradeoffs in the design of the MultiTitan CPU
,
1989,
ISCA '89.
[2]
A. Kumar,et al.
A 1.2 GHz Alpha microprocessor with 44.8 GB/s chip pin bandwidth
,
2001,
2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).
[3]
Doug Hunt,et al.
Advanced performance features of the 64-bit PA-8000
,
1995,
Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.
[4]
Norman P. Jouppi,et al.
Architectural And Organizational Tradeoffs In The Design Of The Multititan CPU
,
1989,
The 16th Annual International Symposium on Computer Architecture.
[5]
Kourosh Gharachorloo,et al.
Architecture and design of AlphaServer GS320
,
2000,
SIGP.
[6]
Larry L. Biro,et al.
Power considerations in the design of the Alpha 21264 microprocessor
,
1998,
Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).
[7]
Dean M. Tullsen,et al.
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
,
1997,
TOCS.
[8]
William J. Bowhill,et al.
Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU
,
1995,
Digit. Tech. J..
[9]
David A. Patterson,et al.
Computer Architecture: A Quantitative Approach
,
1969
.
[10]
Richard E. Kessler,et al.
The Alpha 21264 microprocessor
,
1999,
IEEE Micro.
[11]
Ruben W. Castelino,et al.
Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor
,
1995,
Digit. Tech. J..
[12]
Kenneth C. Yeager.
The Mips R10000 superscalar microprocessor
,
1996,
IEEE Micro.
[13]
Douglas W. Clark,et al.
Retrospective: characterization of processor performance in the VAX-11/780
,
1998,
International Symposium on Computer Architecture.
[14]
R. Allmon,et al.
High-performance microprocessor design
,
1998,
IEEE J. Solid State Circuits.
[15]
Luiz André Barroso,et al.
Piranha: a scalable architecture based on single-chip multiprocessing
,
2000,
Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[16]
Luiz André Barroso,et al.
Impact of chip-level integration on performance of OLTP workloads
,
2000,
Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[17]
Richard E. Kessler,et al.
The Alpha 21264 microprocessor architecture
,
1998,
Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).
[18]
T. Xanthopoulos,et al.
The design and analysis of the clock distribution network for a 1.2 GHz Alpha microprocessor
,
2001,
2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).