Reducing Cache Pollution of Threaded Prefetching by Controlling Prefetch Distance

Threaded prefetching based on Chip Multiprocessor (CMP) issues memory requests for data needed later by the main computation, and therefore may lead to increased stress on limited shared cache space and bus bandwidth. In our earlier work, we had proposed an effective threaded prefetching technique that selects proper prefetch distance for specific application to improve the timeliness of prefetching. In this paper, we first estimate the upper limit of prefetch distance for specific application in our proposed threaded prefetching technique, and then analyze the effect of increasing prefetch distance on shared cache pollution. Our experimental evaluations indicated that the bounded range of effective prefetch distance can be determined using our method, and the shared cache pollution can be reduced by controlling prefetch distance in our proposed threaded prefetching technique.

[1]  Jean-Luc Gaudiot,et al.  Potential Impact of Value Prediction on Communication in Many-Core Architectures , 2009, IEEE Transactions on Computers.

[2]  Brad Calder,et al.  Pointer cache assisted prefetching , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[3]  Donald Yeung,et al.  Design and evaluation of compiler algorithms for pre-execution , 2002, ASPLOS X.

[4]  Martin Hirzel,et al.  Dynamic hot data stream prefetching for general-purpose programs , 2002, PLDI '02.

[5]  Xian-He Sun,et al.  An Adaptive Data Prefetcher for High-Performance Processors , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[6]  Zhimin Gu,et al.  The Stable Conditions of a Task-Pair with Helper-Thread in CMP , 2009, PDPTA.

[7]  Zhimin Gu,et al.  Prefetching in Embedded Mobile Systems Can Be Energy-Efficient , 2011, IEEE Computer Architecture Letters.

[8]  Gurindar S. Sohi,et al.  Master/Slave Speculative Parallelization , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[9]  Wei-Chung Hsu,et al.  Dynamic helper threaded prefetching on the Sun UltraSPARC/spl reg/ CMP processor , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[10]  John Paul Shen,et al.  Post-pass binary adaptation for software-based speculative precomputation , 2002, PLDI '02.

[11]  Jean-Luc Gaudiot,et al.  Speculative Execution on GPU: An Exploratory Study , 2010, 2010 39th International Conference on Parallel Processing.

[12]  Yonghong Song,et al.  Design and implementation of a compiler framework for helper threading on multi-core processors , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[13]  Jean-Luc Gaudiot,et al.  Value Prediction and Speculative Execution on GPU , 2011, International Journal of Parallel Programming.

[14]  Onur Mutlu,et al.  Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Onur Mutlu,et al.  Prefetch-Aware DRAM Controllers , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[16]  Todd C. Mowry,et al.  Tolerating latency in multiprocessors through compiler-inserted prefetching , 1998, TOCS.

[17]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[18]  Surendra Byna,et al.  Taxonomy of Data Prefetching for Multicore Processors , 2009, Journal of Computer Science and Technology.

[19]  Surendra Byna,et al.  A Taxonomy of Data Prefetching Mechanisms , 2008, 2008 International Symposium on Parallel Architectures, Algorithms, and Networks (i-span 2008).

[20]  Wei-Chung Hsu,et al.  Design and Implementation of a Lightweight Dynamic Optimization System , 2004, J. Instr. Level Parallelism.

[21]  Jaejin Lee,et al.  Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems , 2009, IEEE Transactions on Parallel and Distributed Systems.

[22]  John Paul Shen,et al.  Dynamic speculative precomputation , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[23]  Martin Burtscher,et al.  Future execution: a hardware prefetching technique for chip multiprocessors , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[24]  Weifeng Zhang,et al.  Accelerating and Adapting Precomputation Threads for Effcient Prefetching , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[25]  Jean-Luc Gaudiot,et al.  A Theoretical Framework for Value Prediction in Parallel Systems , 2010, 2010 39th International Conference on Parallel Processing.

[26]  Zhimin Gu,et al.  Performance Analysis of Prefetching Thread for Linked Data Structure in CMPs , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[27]  Gurindar S. Sohi,et al.  Speculative Multithreaded Processors , 2001, Computer.

[28]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[29]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[30]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[31]  Donald Yeung,et al.  Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[32]  Jack Doweck,et al.  Inside Intel® Core microarchitecture , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).

[33]  Zhimin Gu,et al.  The Performance Optimization of Threaded Prefetching for Linked Data Structures , 2011, International Journal of Parallel Programming.