Improving the Effectiveness of Context-Based Prefetching with Multi-order Analysis

Data prefetching is an effective way to accelerate data access in high-end computing systems and to bridge the increasing performance gap between processor and memory. In recent years, the contextbased data prefetching has received intensive attention because of its general applicability. In this study, we provide a preliminary analysis of the impact of orders on the effectiveness of the context-based prefetching. Motivated by the observations from the analytical results, we propose a new context-based prefetching method named Multi-Order Context-based (MOC) prefetching to adopt multi-order context analysis to increase the context-based prefetching effectiveness. We have carried out simulation testing with the SPECCPU2006 benchmarks via an enhanced CMP$im simulator. The simulation results show that the proposed MOC prefetching method outperforms the existing single-order prefetching and reduces the data-access latency effectively.

[1]  Koen De Bosschere,et al.  Differential FCM: increasing value prediction accuracy by improving table usage efficiency , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[2]  James E. Smith,et al.  Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[3]  Anand Sivasubramaniam,et al.  Going the distance for TLB prefetching: an application-driven study , 2002, ISCA.

[4]  Surendra Byna,et al.  Data access history cache and associated data prefetching mechanisms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[5]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  Fernando Gustavo Tinetti,et al.  Computer Architecture: A Quantitative Approach J. L. Hennessy, D. A. Patterson Morgan Kaufman, 4th Edition, 2007 , 2008 .

[7]  Michel Dubois,et al.  Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[8]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[9]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[10]  Marcelo Cintra,et al.  Stream chaining: exploiting multiple levels of correlation in data prefetching , 2009, ISCA '09.

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  Víctor Viñals,et al.  Multi-level Adaptive Prefetching based on Performance Gradient Tracking , 2011, J. Instr. Level Parallelism.

[13]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[14]  Sally A. McKee,et al.  Reflections on the memory wall , 2004, CF '04.

[15]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[16]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[17]  Víctor Viñals,et al.  Data prefetching in a cache hierarchy with high bandwidth and capacity , 2007, CARN.

[18]  Michel Dubois,et al.  Sequential Hardware Prefetching in Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..

[19]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[20]  Cloyce D. Spradling SPEC CPU2006 benchmark tools , 2007, CARN.

[21]  Surendra Byna,et al.  A Taxonomy of Data Prefetching Mechanisms , 2008, 2008 International Symposium on Parallel Architectures, Algorithms, and Networks (i-span 2008).

[22]  K.J. Nesbit,et al.  AC/DC: an adaptive data cache prefetcher , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[23]  Xian-He Sun,et al.  An Adaptive Data Prefetcher for High-Performance Processors , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[24]  James E. Smith,et al.  Implementations of Context Based Value Predictors , 1997 .

[25]  B. Jacob,et al.  CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .

[26]  Douglas J. Joseph,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.