Tolerating Memory Latency Using a Hardware-Based Active-Pushing Technique

The pre-sending technique, proposed from distributed shared memory systems, pushes data to cache instead of pulling,aiming at reducing the traffic of communication. On a purpose of effectively improving cache hit ratio, this paper proposes a hardware-based active-pushing technique, which directs data owners like lower-level of memory hierarchy to actively push the predicted data at the right moment to a upper level, which is closer to the CPU, therefore achieving the object of reducing memory stall time. Again, a further optimization aimed at the timeliness of active-pushing technique is introduced. The prefetching, pre-sending, active-pushing and optimized active-pushing technique are, respectively, simulated upon the microprocessor simulation platform of "Longtium" R2. Experimenting results show that both the active-pushing technique and the optimized one improve cache hit ratio significantly compared with the rest.

[1]  Chia-Lin Yang,et al.  Tolerating memory latency through push prefetching for pointer-intensive applications , 2004, TACO.

[2]  Stefan G. Berg Cache Prefetching , 2002 .

[3]  YangChia-Lin,et al.  Tolerating memory latency through push prefetching for pointer-intensive applications , 2004 .

[4]  Donald Yeung,et al.  Multi-chain prefetching: effective exploitation of inter-chain memory parallelism for pointer-chasing codes , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[5]  Josep Torrellas,et al.  Comparing data forwarding and prefetching for communication-induced misses in shared-memory MPs , 1998, ICS '98.

[6]  Surendra Byna,et al.  Improving Data Access Performance with Server Push Architecture , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[7]  Chia-Lin Yang,et al.  Push vs. pull: data movement for linked data structures , 2000, ICS '00.

[8]  Douglas J. Joseph,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[9]  Charles J. Hughes,et al.  Prefetching linked data structures in systems with merged dram-logic , 2000 .

[10]  Thomas Alexander,et al.  Distributed prefetch-buffer/cache design for high performance memory systems , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[11]  Josep Torrellas,et al.  Using a user-level memory thread for correlation prefetching , 2002, ISCA.

[12]  Luis Angel D. Bathen,et al.  Optimal multistream sequential prefetching in a shared cache , 2007, TOS.

[13]  Dirk Grunwald,et al.  Content-Based Prefetching: Initial Results , 2000, Intelligent Memory Systems.

[14]  Per Stenström,et al.  A prefetching technique for irregular accesses to linked data structures , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[15]  Andreas Moshovos,et al.  Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.