Trace cache design for wide-issue superscalar processors
暂无分享,去创建一个
To maximize the performance of a wide-issue superscalar processor, the fetch mechanism must be capable of delivering at least the same instruction bandwidth as the execution mechanism is capable of consuming. Fetch mechanisms consisting of a simple instruction cache are limited by difficulty in fetching a branch and its taken target in a single cycle. Such fetch mechanisms will not suffice for processors capable of executing multiple basic blocks' worth of instructions.
The Trace Cache is proposed to deal with lost fetch bandwidth due to branches. The trace cache is a structure which overcomes this partial fetch problem by storing logically contiguous instructions-instructions which are adjacent in the instruction stream-in physically contiguous storage. In this manner, the trace cache is able to deliver multiple non-contiguous blocks each cycle.
This dissertation contains a description of the trace cache mechanism for a 16-wide issue processor, along with an evaluation of basic parameters of this mechanism, such as relative size and associativity. The main contributions of this dissertation are a series of trace cache enhancements which boost instruction fetch bandwidth by 34% and overall performance by 14% over an aggressive instruction cache. Also included is an analysis of two important performance limitations of the trace cache: branch resolution time and instruction duplication.