A hybrid sample generation approach in speculative multithreading

Speculative multithreading (SpMT) is a thread-level automatic parallelization technique to accelerate sequential programs. Machine learning has been successfully brought into SpMT to improve its performance. An appropriate sample set, which plays the role of knowledge provider, is important for machine learning-based (ML-based) thread partition. Conventionally, heuristic rules-based (HR-based) sample generation approach cannot generate adaptive samples. A hybrid sample generation approach can break this bottleneck. With this method, we firstly automatically generate samples, which are MIPS codes consisting of spawning points (SPs) and control quasi-independent points (CQIPs) by heuristic rules; secondly manually adjust the positions of SPs and CQIPs and rebuild pre-computation slice to obtain better performance for every sample; and then build model to ensure that the probability of adjusting to the optimal partition positions is increasing. During the implementation of this approach, three measures: bias weighting, preservation of optimal solutions, summary of greedy rules, are taken. In this way, we enhance the adjustment frequency for subroutines with high called time and preserve the optimal partition positions, so to achieve a stable speedup improvement. On Prophet, which is a generic SpMT processor to evaluate the performance of multithreaded programs, SPEC2000 and Olden benchmarks are used as input. Experiments show that our approach can obtain better sample sets, which deliver a better performance improvement of about 86.9% on a 16 core than the samples generated by HR-based approach. Experiment results also prove that this approach is effective to generate sample sets for ML-based thread partition.

[1]  Kunle Olukotun,et al.  The Stanford Hydra CMP , 2000, IEEE Micro.

[2]  Manoj Franklin Multiscalar Processors , 2002 .

[3]  Satoshi Matsushita,et al.  Pinot: speculative multi-threading processor architecture exploiting parallelism over a wide range of granularities , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[4]  Der-Chiang Li,et al.  A non-linearly virtual sample generation technique using group discovery and parametric equations of hypersphere , 2009, Expert Syst. Appl..

[5]  Michael F. P. O'Boyle,et al.  Automatic performance model construction for the fast software exploration of new hardware designs , 2006, CASES '06.

[6]  Clark Verbrugge,et al.  SableSpMT: a software framework for analysing speculative multithreading in Java , 2005, PASTE '05.

[7]  Dean M. Tullsen,et al.  Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices , 2005, PLDI '05.

[8]  James R. Larus,et al.  Loop-Level Parallelism in Numeric and Symbolic Programs , 1993, IEEE Trans. Parallel Distributed Syst..

[9]  David A. Bader,et al.  Parallel Algorithm Design for Branch and Bound , 2005 .

[10]  Jianpei Zhang,et al.  A novel virtual sample generation method based on Gaussian distribution , 2011, Knowl. Based Syst..

[11]  Bin Liu,et al.  A Novel Thread Partitioning Approach Based on Machine Learning for Speculative Multithreading , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[12]  Zheng Chen,et al.  A Thread Partitioning Method for Speculative Multithreading , 2009, 2009 International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded Computing.

[13]  T. Poggio,et al.  Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries , 1992 .

[14]  Antonia Zhai,et al.  Code Transformations for Enhancing the Performance of speculatively Parallel Threads , 2012, J. Circuits Syst. Comput..

[15]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[16]  Kunle Olukotun,et al.  Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor , 1997 .

[17]  Chen Yang,et al.  A cost-driven compilation framework for speculative parallelization of sequential programs , 2004, PLDI '04.

[18]  Per Stenström,et al.  An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors , 2001, J. Instr. Level Parallelism.

[19]  Manoj Franklin,et al.  A general compiler framework for speculative multithreading , 2002, SPAA '02.

[20]  Zhen Cao,et al.  Mixed Model Universal Software Thread-Level Speculation , 2013, 2013 42nd International Conference on Parallel Processing.

[21]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[22]  Zhao Hengxing Prophet+:an Extended Multicore Simulator for Speculative Multithreading , 2010 .

[23]  Michael F. P. O'Boyle,et al.  A workload-aware mapping approach for data-parallel programs , 2011, HiPEAC.

[24]  Francisco Tirado,et al.  Analysis of simulation-adapted SPEC 2000 benchmarks , 2002, CARN.

[25]  Hyesoon Kim,et al.  SD3: A Scalable Approach to Dynamic Data-Dependence Profiling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[26]  Michael F. P. O'Boyle,et al.  Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code , 2014, CC.

[27]  Zhiyuan Li,et al.  Fast loop-level data dependence profiling , 2012, ICS '12.

[28]  Philippe Clauss,et al.  Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[29]  John Paul Shen,et al.  Mitosis: A Speculative Multithreaded Processor Based on Precomputation Slices , 2008, IEEE Transactions on Parallel and Distributed Systems.

[30]  Jin Lin,et al.  Data Dependence Profiling for Speculative Optimizations , 2004, CC.

[31]  Rajesh Bordawekar,et al.  Modeling optimistic concurrency using quantitative dependence analysis , 2008, PPOPP.

[32]  Boqin Feng,et al.  A thread partitioning approach for speculative multithreading , 2013, The Journal of Supercomputing.

[33]  Peng Wu,et al.  Compiler-Driven Dependence Profiling to Guide Program Parallelization , 2008, LCPC.

[34]  Zhaoyu Dong,et al.  Prophet: A Speculative Multi-threading Execution Model with Architectural Support Based on CMP , 2009, 2009 International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded Computing.

[35]  Steven W. K. Tjiang,et al.  SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.

[36]  Antonia Zhai,et al.  The STAMPede approach to thread-level speculation , 2005, TOCS.

[37]  David I. August,et al.  Automatically exploiting cross-invocation parallelism using runtime information , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[38]  Bin Liu,et al.  A Virtual Sample Generation Approach for Speculative Multithreading Using Feature Sets and Abstract Syntax Trees , 2012, 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies.

[39]  J. Liang,et al.  Designing the Agassiz Compiler for Concurrent Multithreaded Architectures , 1999, LCPC.

[40]  Geoff V. Merrett,et al.  Adaptive energy minimization of embedded heterogeneous systems using regression-based learning , 2015, 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[41]  Antonia Zhai,et al.  Dynamically dispatching speculative threads to improve sequential execution , 2012, TACO.

[42]  Adam Craig Pocock,et al.  Static Java Program Features for Intelligent Squash Prediction , 2009 .

[43]  Zheng Chen,et al.  An Overview of Prophet , 2009, ICA3PP.

[44]  Gurindar S. Sohi,et al.  Speculative Multithreaded Processors , 2001, Computer.

[45]  François Bodin,et al.  A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.

[46]  W. McComas Benchmarks for Science Literacy , 2014 .

[47]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[48]  Haitham Akkary,et al.  Disjoint out-of-order execution processor , 2012, TACO.

[49]  Xuan Chen,et al.  Adaptive Multi-versioning for OpenMP Parallelization via Machine Learning , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[50]  Michael F. P. O'Boyle,et al.  Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.

[51]  David A. Bader Analyzing Massive Social Networks Using Multicore and Multithreaded Architectures , 2010, Facing the Multicore-Challenge.

[52]  Huan Gao,et al.  Using Artificial Neural Network for Predicting Thread Partitioning in Speculative Multithreading , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[53]  Michael F. P. O'Boyle,et al.  Partitioning streaming parallelism for multi-cores: A machine learning based approach , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[54]  Kunle Olukotun,et al.  Improving the performance of speculatively parallel applications on the Hydra CMP , 1999 .

[55]  Rudolf Eigenmann,et al.  Polaris: A New-Generation Parallelizing Compiler for MPPs , 1993 .

[56]  Wei Liu,et al.  POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.

[57]  Grigori Fursin,et al.  A Cost-Aware Parallel Workload Allocation Approach Based on Machine Learning Techniques , 2007, NPC.

[58]  Lin Gao,et al.  SEED: A Statically Greedy and Dynamically Adaptive Approach for Speculative Loop Execution , 2013, IEEE Transactions on Computers.

[59]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[60]  Monica S. Lam,et al.  The SUIF Compiler System: a Parallelizing and Optimizing Research Compiler , 1994 .