The impact of pipeline length on the performance of a microprocessor is explored both theoretically and by simulation. An analytical theory is presented that shows two opposing architectural parameters affect the optimal pipeline length: the degree of instruction level parallelism (superscalar) decreases the optimal pipeline length, while the lack of pipeline stalls increases the optimal pipeline length. This theory is tested by analyzing the optimal pipeline length for 35 applications representing three classes of workloads. Trace tapes are collected from SPEC95 and SPEC2000 applications, traditional (legacy) database and online transaction processing (OLTP) applications, and modern applications primarily written in Java and C++. The results show that there is a clear and significant difference in the optimal pipeline length between the SPEC workloads and both the legacy and modern applications. The SPEC applications, written in C, optimize to a shorter pipeline length than the legacy applications, largely written in assembler language, with relatively little overlap in the two distributions. Additionally, the optimal pipeline length distribution for the C++ and Java workloads overlaps with the legacy applications, suggesting similar workload characteristics. These results are explored across a wide range of superscalar processors, both in-order and out-of-order.
[1]
James E. Smith,et al.
Optimal Pipelining in Supercomputers
,
1986,
ISCA.
[2]
Philip G. Emma,et al.
Characterization of Branch and Data Dependencies in Programs for Evaluating Pipeline Performance
,
1987,
IEEE Transactions on Computers.
[3]
Mark J. Charney,et al.
Prefetching and memory system behavior of the SPEC95 benchmark suite
,
1997,
IBM J. Res. Dev..
[4]
Vikas Agarwal,et al.
Clock rate versus IPC: the end of the road for conventional microarchitectures
,
2000,
Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).