The cache injection/cofetch architecture: initial performance evaluation

One of the major problems in a number of SM (shared memory) and DSM (distributed shared memory) applications is the overall cost of read misses in conditions when: (a) system latencies are relatively large, and (b) a shared data item is read relatively few times by each of the processors in the system; modern SM and DSM systems are typically based on off-the-shelf microprocessors which do not include any support for the described problem. Consequently, the major goal of our research is to come up with a new concept to be incorporated into the next generation microprocessors, so they can became more efficient in the sense described above. Existing 64-bit processors support only data prefetching (PF) as a method to fight against negative effects of the described problem. Our research introduces a new concept referred to as cache injection (CI), as well as the related cache injection/cofetch architecture (CICA). Initial performance evaluation is performed using a simulation methodology based on the set of synthetic benchmarks.