Pentium 4 (partially) previewed

design goals and microarchitecture of the Pentium 4's new core. The announcement answered some questions but begged others. For example, the P4 will be announced at " at least " 1.4GHz, according to Intel, but the company has said nothing about the P4's performance relative to the Pentium 3, currently shipping at 1.13GHz. The higher operating frequency of the new part is made possible by a hyperpipelined core with 20 stages. Intel calls this new architecture NetBurst rather than P7, or some other sequential code-name of the type used in the past. As Figure 1 shows, the NetBurst pipeline is twice as deep as that of the P6, which in turn had twice the depth of the P5's. Increasing pipeline depth increases logic complexity and branch penalties, but it also allows clock speeds to increase. We expect the new core to reach 2GHz—a speed demonstrated at IDF—before it moves to a 0.13-micron process in 2001. The two Drive stages shown in Figure 1 represent time required to move signals across the chip. No other work is done during these stages. As far as we know, NetBurst is the first pipeline with dedicated stages for wire delays. Although the new pipeline has one execution stage, the P4's two ALUs execute many operations in one-half of a clock period. Shifts and some other operations still spend one full clock period in the ALU, and these operations must start at the beginning of the period. Since the rest of the pipeline can process only two ALU operations per clock, the faster ALUs don't increase peak throughput—but they do boost sustained throughput. When two ALU operations are ready for execution, one of which depends on the results of the other, the Pentium 4 can complete the first operation in the Intel has released a few more details of its next IA-32 processor, formerly code-named Willa-mette and now slated to be sold as the Pentium 4. At the Intel Developer Forum last week, Intel's CEO Craig Barrett and vice president Albert Yu provided interesting insights into the Figure 1. The new hyperpipelined NetBurst microarchitecture of the Pentium 4 allows its clock rate to be increased significantly over that of the Pentium 3. This figure, based on information provided by Intel, shows the portion of each pipeline involved in ALU operations under branch mispredictions.