IPULOC — Exploring dynamic program locality with the instruction processing unit for filling memory gap

Memory gap has become an essential factor influencing the peak performance of high-speed CPU-based systems. To fill this gap, enlarging cache capacity has been a traditional method based on static program locality principle. However, the order of instructions stored in I-Cache before being sent to Data Processing Unit (DPU) is a kind of useful information that has not ever been utilized before. So an architecture containing an Instruction Processing Unit (IPU) in parallel with the ordinary DPU is proposed. The IPU can prefetch, analyze and preprocess a large amount of instructions otherwise lying in the I-Cache untouched. It is more efficient than the conventional prefetch buffer that can only store several instructions for previewing. By IPU, Load Instructions can be preprocessed while the DPU is executing on data simultaneously. It is termed as “Instruction Processing Unit with LOokahead Cache” (IPULOC for short) in which the idea of dynamic program locality is presented. This paper describes the principle of IPULOC and illustrates the quantitative parameters for evaluation. Tools for simulating the IPULOC have been developed. The simulation results shows that it can improve program locality during program execution, and hence can improve the cache hit ratio correspondingly without further enlarging the on-chip cache that occupies a large portion of chip area.