Analysis and Characterization of Intel Itanium Instruction Bundles for Improving VLIW Processor Performance

In order to achieve high instruction level parallelism (ILP), designers are turning to very long instruction word (VLIW) based designs, in which different types of instructions are grouped together as bundles of 128 bits or longer. In VLIW, the added nops increase the code size, limit processor performance by the under-utilization of functional units. In examining these performance issues of VLIW systems, we consider Intel first 64-bit architecture, the IA-64, and its first implementation, the Itanium, which employs Intel version of VLIW. We present a comprehensive analysis of the problem of under-utilization due to nops and stops across a wide range of application domains through the use of three different benchmark suites: SPEC CPU 2000, MediaBench, and PacketBench. Our results show that, on average, nops create an under-utilization factor of 28.46% in the case of SPEC CPU, 32.27% in MediaBench, and 29.76% in PacketBench. We also analyze the characteristics of different instruction bundle formats, which we obtain by collecting statistics concerning the frequency of the bundle formats

[1]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[2]  Harsh Sharangpani,et al.  Itanium Processor Microarchitecture , 2000, IEEE Micro.

[3]  Harish Patil,et al.  Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture , 2004, CGO.

[4]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[5]  Sebastian Winkel,et al.  ILP-based Instruction Scheduling for IA-64 , 2001, OM '01.

[6]  Jean-Francois Collard,et al.  Optimizations to prevent cache penalties for the Intel® Itanium® 2 Processor , 2003, CGO.

[7]  Harish Patil,et al.  Ispike: a post-link optimizer for the Intel/spl reg/ Itanium/spl reg/ architecture , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[8]  Sebastian Winkel,et al.  Optimal Global Scheduling for Itanium TM Processor Family , 2002 .

[9]  Sebastian Winkel,et al.  Exploring the performance potential of Itanium/spl reg/ processors with ILP-based scheduling , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[10]  Daniel M. Lavery,et al.  Optimization for the Intel/spl reg/ Itanium/spl reg/ architecture register stack , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[11]  Jonathan Ross,et al.  OS and compiler considerations in the design of the IA-64 architecture , 2000, SIGP.

[12]  Sebastian Winkel,et al.  ILP-based Instruction Scheduling for IA-64 , 2001 .

[13]  Daniel M. Lavery,et al.  Optimization for the Intel® Itanium ®Architectur Register Stack , 2003, CGO.

[14]  Rajeev Barua,et al.  EPIC Instruction Scheduling Based on Optimal Approaches , 2001 .

[15]  Jerzy W. Rozenblit,et al.  A new framework for power estimation of embedded systems , 2005, Computer.

[16]  Rumi Zahir,et al.  OS and Compiler Considerations in the Design of the IA-64 Architecture , 2000, ASPLOS.