SPARC-based VLIW testbed

The performance of very long instruction word (VLIW) microprocessors depends on the close co-operation between the compiler and the architecture. To design a high-performance VLIW a testbed is required that allows detailed co-evaluation of both compilation techniques and architectural features. The paper introduces a new VLIW testbed based on the SPARC instruction set architecture, which includes an aggressive scheduling compiler and a fast VLIW simulator. The compiler takes gcc-generated optimised SPARC code as input and generates parallelised VLIW code, targeting advanced VLIW architectures. The compiler can generate high-performance VLIW code, especially for non-numerical integer programs. The VLIW code is translated into a dedicated C program for fast and simple compiled simulation which generates detailed data for performance. The authors have performed a comprehensive empirical study on the testbed for both large-resource and small-resource machines. The result shows that as much as a geometric mean of fourfold speedup is obtainable on nontrivial integer benchmarks without using branch probability when performing speculative code motion. Also analysed are the characteristics of the useful and useless ALU operations in each cycle to see how the speedup is obtained. The analysis indicates that around half of the useful ALUs execute speculative instructions whose original paths are taken (thus being "hit"), yet a substantial number of ALUs are also wasted owing to useless speculative execution or copy execution.