Solving Parameter Selection Problem of Helper Thread Prefetching via Realtime Hardware Performance Monitoring

Helper thread prefetching have the potential of improving the performance of irregular data intensive applications, but the prefetching effect depends on how efficiently and swiftly the control parameters can be selected. The parameter selection and optimization was done by executing the application exhaustively in prior works. In this study, we propose a helper thread prefetching control framework, which adjusts the control parameters of helper thread automatically, called HPCF. We present the idea, initial design and implementation of HPCF. In particular, we establish a dynamic control model of helper thread prefetching and develop a two-level parameter selection algorithm. We evaluate the proposed HPCF framework on commodity multi-core platforms by using selected benchmarks which come from SPEC2006, Olden and SSCA2. Results show that our approach performs almost equal to our prior static Skip Helper Thread prefetching scheme, while the parameter selection was done by executing the application only once. And it achieves up to 33.3%, 18.2% and 18.6% performance improvement for MST, MCF, and SSCA2 benchmarks, respectively.

[1]  Christopher Hughes,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, ISCA 2001.

[2]  Wei-Chung Hsu,et al.  Dynamic helper threaded prefetching on the Sun UltraSPARC/spl reg/ CMP processor , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[3]  Jean-Luc Gaudiot,et al.  A Theoretical Framework for Value Prediction in Parallel Systems , 2010, 2010 39th International Conference on Parallel Processing.

[4]  Zhimin Gu,et al.  Reducing Cache Pollution of Threaded Prefetching by Controlling Prefetch Distance , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[5]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[6]  Jean-Luc Gaudiot,et al.  Potential Impact of Value Prediction on Communication in Many-Core Architectures , 2009, IEEE Transactions on Computers.

[7]  Donald Yeung,et al.  Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[8]  Donald Yeung,et al.  Design and evaluation of compiler algorithms for pre-execution , 2002, ASPLOS X.

[9]  Chi-Keung Luk,et al.  Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[10]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[11]  John Paul Shen,et al.  Dynamic speculative precomputation , 2001, MICRO.

[12]  Mahmut T. Kandemir,et al.  A compiler-directed data prefetching scheme for chip multiprocessors , 2009, PPoPP '09.

[13]  Dionisios N. Pnevmatikatos,et al.  Slice-processors: an implementation of operation-based prediction , 2001, ICS '01.

[14]  Jignesh M. Patel,et al.  Data prefetching by dependence graph precomputation , 2001, ISCA 2001.

[15]  Dean M. Tullsen,et al.  Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.

[16]  Zhimin Gu,et al.  Exposing the Shared Cache Behavior of Helper Thread on CMP Platforms , 2011, 2011 14th IEEE International Conference on Computational Science and Engineering.

[17]  Jaejin Lee,et al.  Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems , 2009, IEEE Transactions on Parallel and Distributed Systems.

[18]  Jaejin Lee,et al.  Helper thread prefetching for loosely-coupled multiprocessor systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[19]  Zhimin Gu,et al.  Performance evaluation of data-push thread on commercial CMP platform , 2010, INC2010: 6th International Conference on Networked Computing.

[20]  John Paul Shen,et al.  Post-pass binary adaptation for software-based speculative precomputation , 2002, PLDI '02.

[21]  Yonghong Song,et al.  Design and implementation of a compiler framework for helper threading on multi-core processors , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[22]  Zhimin Gu,et al.  The Performance Optimization of Threaded Prefetching for Linked Data Structures , 2011, International Journal of Parallel Programming.

[23]  Jean-Luc Gaudiot,et al.  Value Prediction and Speculative Execution on GPU , 2011, International Journal of Parallel Programming.

[24]  Gurindar S. Sohi,et al.  Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[25]  Zhimin Gu,et al.  Estimating Effective Prefetch Distance in Threaded Prefetching for Linked Data Structures , 2012, International Journal of Parallel Programming.