Increasing cache bandwidth using multi-port caches for exploiting ILP in non-numerical code

Modern microprocessors that exploit instruction-level parallelism (ILP) require higher data cache bandwidth than sequential machines, since at least the same number of cache references (or more due to speculative execution) must be made in fewer clock cycles. To support the required bandwidth, multiport caches have been proposed, allowing the execution of multiple load/store instructions in a single cycle. Deciding the minimal number of cache ports that do not deeply affect performance is one of major resource allocation problems of ILP machines. Unfortunately, this decision is difficult to make for machines that exploit ILP in non-numerical code due to its irregularity. In this short paper, a comprehensive empirical study is performed on selected integer benchmarks using an aggressive ILP compiler aimed at characterising a suitable number of cache ports and at evaluating the metric of cache bandwidth. This study differs from previous ones in that the compiler can successfully exploit ILP in proportion to the amount of resources, thus measuring the performance impact of multiport caches more accurately. The results indicate that multiport caches that provide high bandwidth significantly improve the performance of ILP machines (i.e. as much as a geometric mean of 50%), yet there is a consistent upper bound on the number of required ports.