论文信息 - Tolerating Memory Latency Using a Hardware-Based Active-Pushing Technique

Tolerating Memory Latency Using a Hardware-Based Active-Pushing Technique

The pre-sending technique, proposed from distributed shared memory systems, pushes data to cache instead of pulling,aiming at reducing the traffic of communication. On a purpose of effectively improving cache hit ratio, this paper proposes a hardware-based active-pushing technique, which directs data owners like lower-level of memory hierarchy to actively push the predicted data at the right moment to a upper level, which is closer to the CPU, therefore achieving the object of reducing memory stall time. Again, a further optimization aimed at the timeliness of active-pushing technique is introduced. The prefetching, pre-sending, active-pushing and optimized active-pushing technique are, respectively, simulated upon the microprocessor simulation platform of "Longtium" R2. Experimenting results show that both the active-pushing technique and the optimized one improve cache hit ratio significantly compared with the rest.

Jie Chen | Xiaoya Fan | Liwen Shi | Hangpei Tian | Xiaoping Huang

[1] Chia-Lin Yang,et al. Tolerating memory latency through push prefetching for pointer-intensive applications , 2004, TACO.

[2] Stefan G. Berg. Cache Prefetching , 2002 .

[3] YangChia-Lin,et al. Tolerating memory latency through push prefetching for pointer-intensive applications , 2004 .

[4] Donald Yeung,et al. Multi-chain prefetching: effective exploitation of inter-chain memory parallelism for pointer-chasing codes , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[5] Josep Torrellas,et al. Comparing data forwarding and prefetching for communication-induced misses in shared-memory MPs , 1998, ICS '98.

[6] Surendra Byna,et al. Improving Data Access Performance with Server Push Architecture , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[7] Chia-Lin Yang,et al. Push vs. pull: data movement for linked data structures , 2000, ICS '00.

[8] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[9] Charles J. Hughes,et al. Prefetching linked data structures in systems with merged dram-logic , 2000 .

[10] Thomas Alexander,et al. Distributed prefetch-buffer/cache design for high performance memory systems , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[11] Josep Torrellas,et al. Using a user-level memory thread for correlation prefetching , 2002, ISCA.

[12] Luis Angel D. Bathen,et al. Optimal multistream sequential prefetching in a shared cache , 2007, TOS.

[13] Dirk Grunwald,et al. Content-Based Prefetching: Initial Results , 2000, Intelligent Memory Systems.

[14] Per Stenström,et al. A prefetching technique for irregular accesses to linked data structures , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[15] Andreas Moshovos,et al. Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.