Smaller Split L-1 Data Caches for Multi-core Processing Systems

As more cores (processing elements) are included in a single chip, it is likely that the sizes of per core L-1 caches will become smaller while more cores will share L-2 cache resources. It becomes more critical to improve the use of L-1 caches and minimize sharing conflicts for L-2 caches. In our prior work we have shown that using smaller but separate L-1 array data and L-1 scalar data cache, instead of a larger single L-1 data cache, can lead to significant performance improvements. In this paper we will extend our experiments by varying cache design parameters including block size, associativity and number of sets for L-1 array and L-1 scalar caches. We will also present the affect of separate array and scalar caches on the non-uniform accesses to different (L-1) cache sets exhibited while using a single (L-1) data cache. For this purpose we use third and fourth central moments (skewness and kurtosis), which characterize the access patterns. Our experiments show that for several embedded benchmarks (from MiBench) split data caches significantly mitigate the problem of non-uniform accesses to cache sets (leading to more uniform utilization of cache resources, reduction of conflicts to cache sets, and minimizing hot spots in cache). They also show that neither higher set-associativities nor large block sizes are necessary with split cache organizations.