Modelling communication overhead for accessing local memories in hardware accelerators

Local memories increase the efficiency of hardware accelerators by enabling fast accesses to frequently used data. In addition, the access latencies of local memories are deterministic which allows for more accurate evaluation of the system performance during design exploration. We have previously proposed local memories with an un-cached memory slave interface that permits program running on the processor to access the locally stored variables in the hardware accelerator. While this has relaxed the memory constraints for porting code sections to hardware accelerators, there is now a need to consider the read/write access penalties of local memories from the processor during design exploration. In order to facilitate the selection of profitable hardware accelerators, we need an accurate performance model that takes into account these read/write access penalties. In this paper, we propose a novel model to estimate the penalty incurred due to memory dependencies between the program running on the processor and the local memories in the FPGA hardware accelerator. This model can be used in an automated design exploration framework for heterogeneous FPGA platforms to select profitable hardware accelerators with local memories.