Helper Thread Prefetching Control Framework on Chip Multi-processor

Helper thread prefetching can improve performance of irregular data-intensive applications. However, helper thread prefetching quality depends on the values of control parameters. Adopting traditional manual methods to find the better values of control parameters is a time-consuming and complicated enumeration process. For selecting dynamically the better parameter values, this paper proposes a helper thread prefetching control framework (HPCF) based on the dynamic behavior of irregular application. The proposed HPCF is evaluated on commodity multi-core platforms by using selected benchmarks from SPEC2006, Olden, and Scalable Synthetic Compact Application #2 (SSCA2). Results show that the proposed approach is effective, the performance gain is similar to skip helper thread prefetching with the manual best parameter values. The performance improvements for Mst, Mcf, and SSCA2 benchmarks are 34.5, 18.9, and 21.4 %, respectively. More importantly, compared with traditional manual methods, a helper thread does not input the parameter values manually and may be quickly solved by using the HPCF tool.

[1]  Jean-Luc Gaudiot,et al.  A Theoretical Framework for Value Prediction in Parallel Systems , 2010, 2010 39th International Conference on Parallel Processing.

[2]  Zhimin Gu,et al.  Reducing Cache Pollution of Threaded Prefetching by Controlling Prefetch Distance , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[3]  Jaejin Lee,et al.  Helper thread prefetching for loosely-coupled multiprocessor systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[4]  Zhimin Gu,et al.  Performance evaluation of data-push thread on commercial CMP platform , 2010, INC2010: 6th International Conference on Networked Computing.

[5]  Zhimin Gu,et al.  The Performance Optimization of Threaded Prefetching for Linked Data Structures , 2011, International Journal of Parallel Programming.

[6]  Ravi Iyer,et al.  PIRATE: QoS and performance management in CMP architectures , 2010, PERV.

[7]  Dean M. Tullsen,et al.  Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.

[8]  Zhimin Gu,et al.  Exposing the Shared Cache Behavior of Helper Thread on CMP Platforms , 2011, 2011 14th IEEE International Conference on Computational Science and Engineering.

[9]  Jaejin Lee,et al.  Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems , 2009, IEEE Transactions on Parallel and Distributed Systems.

[10]  John Paul Shen,et al.  Dynamic speculative precomputation , 2001, MICRO.

[11]  Zhimin Gu,et al.  Prefetching in Embedded Mobile Systems Can Be Energy-Efficient , 2011, IEEE Computer Architecture Letters.

[12]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[13]  Yonghong Song,et al.  Design and implementation of a compiler framework for helper threading on multi-core processors , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[14]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[15]  Christopher Hughes,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, ISCA 2001.

[16]  Wei-Chung Hsu,et al.  Dynamic helper threaded prefetching on the Sun UltraSPARC/spl reg/ CMP processor , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[17]  Donald Yeung,et al.  Design and evaluation of compiler algorithms for pre-execution , 2002, ASPLOS X.

[18]  Tang Jie,et al.  Improving Performance of the Irregular Data Intensive Application with Small Computation Workload for CMPs , 2011, 2011 40th International Conference on Parallel Processing Workshops.

[19]  Tong Li,et al.  Using OS Observations to Improve Performance in Multicore Systems , 2008, IEEE Micro.

[20]  Donald Yeung,et al.  Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[21]  John Paul Shen,et al.  Post-pass binary adaptation for software-based speculative precomputation , 2002, PLDI '02.

[22]  Gurindar S. Sohi,et al.  Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[23]  Zhimin Gu,et al.  Estimating Effective Prefetch Distance in Threaded Prefetching for Linked Data Structures , 2012, International Journal of Parallel Programming.

[24]  Jean-Luc Gaudiot,et al.  Value Prediction and Speculative Execution on GPU , 2011, International Journal of Parallel Programming.

[25]  Jean-Luc Gaudiot,et al.  Potential Impact of Value Prediction on Communication in Many-Core Architectures , 2009, IEEE Transactions on Computers.

[26]  Mahmut T. Kandemir,et al.  A compiler-directed data prefetching scheme for chip multiprocessors , 2009, PPoPP '09.

[27]  Dionisios N. Pnevmatikatos,et al.  Slice-processors: an implementation of operation-based prediction , 2001, ICS '01.

[28]  Jignesh M. Patel,et al.  Data prefetching by dependence graph precomputation , 2001, ISCA 2001.

[29]  Chi-Keung Luk,et al.  Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[30]  Sally A. McKee,et al.  Prediction-based power estimation and scheduling for CMPs , 2009, ICS '09.