Online Tuning of Parallelism Degree in Parallel Nesting Transactional Memory

This paper addresses the problem of self-tuning the parallelism degree in Transactional Memory (TM) systems that support parallel nesting (PN-TM). This problem has been long investigated for TMs not supporting nesting, but, to the best of our knowledge, has never been studied in the context of PN-TMs. Indeed, the problem complexity is inherently exacerbated in PN-TMs, since these require to identify the optimal parallelism degree not only for top-level transactions but also for nested sub-transactions. The increase of the problem dimensionality raises new challenges (e.g., increase of the search space, and proneness to suffer from local maxima), which are unsatisfactorily addressed by self-tuning solutions conceived for flat nesting TMs. We tackle these challenges by proposing AUTOPN, an on-line self-tuning system that combines model-driven learning techniques with localized search heuristics in order to pursue a twofold goal: i) enhance convergence speed by identifying the most promising region of the search space via model-driven techniques, while ii) increasing robustness against modeling errors, via a final local search phase aimed at refining the model's prediction. We further address the problem of tuning the duration of the monitoring windows used to collect feedback on the system's performance, by introducing novel, domain-specific, mechanisms aimed to strike an optimal trade-off between latency and accuracy of the self-tuning process. We integrated AUTOPN with a state of the art PN-TM (JVSTM) and evaluated it via an extensive experimental study. The results of this study highlight that AUTOPN can achieve gains of up to 45× in terms of increased accuracy and 4× faster convergence speed, when compared with several on-line optimization techniques (gradient descent, simulated annealing and genetic algorithm), some of which were already successfully used in the context of flat nesting TMs.

[1]  Adam Welc,et al.  NePaLTM: Design and Implementation of Nested Parallelism for Transactional Memory Systems , 2009, ECOOP.

[2]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[3]  Rachid Guerraoui,et al.  Leveraging parallel nesting in transactional memory , 2010, PPoPP '10.

[4]  C. Hwang Simulated annealing: Theory and applications , 1988, Acta Applicandae Mathematicae - An International Survey Journal on Applying Mathematics and Mathematical Applications.

[5]  João P. Cachopo,et al.  Lock-free and scalable multi-version software transactional memory , 2011, PPoPP '11.

[6]  Shivnath Babu,et al.  Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..

[7]  João P. Cachopo,et al.  Versioned boxes as the basis for memory transactions , 2006, Sci. Comput. Program..

[8]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[9]  Antony L. Hosking,et al.  Nested transactional memory: Model and architecture sketches , 2006, Sci. Comput. Program..

[10]  E. B. Moss,et al.  Nested Transactions: An Approach to Reliable Distributed Computing , 1985 .

[11]  Jean-François Méhaut,et al.  Adaptive thread mapping strategies for transactional memory applications , 2014, J. Parallel Distributed Comput..

[12]  Mark Moir,et al.  Adaptive integration of hardware and software lock elision techniques , 2014, SPAA.

[13]  M. Luján,et al.  Adaptive Concurrency Control for Transactional Memory , 2007 .

[14]  Nuno Diegues,et al.  Self-Tuning Intel Transactional Synchronization Extensions , 2014, ICAC.

[15]  Roberto Palmieri,et al.  On the analytical modeling of concurrency control algorithms for Software Transactional Memories: The case of Commit-Time-Locking , 2012, Perform. Evaluation.

[16]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[17]  Bruno Ciciani,et al.  Machine Learning-Based Self-Adjusting Concurrency in Software Transactional Memory Systems , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[18]  Mohammad Ansari Weighted adaptive concurrency control for software transactional memory , 2014, The Journal of Supercomputing.

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  Kerstin Mueller Nested Transactions An Approach To Reliable Distributed Computing , 2016 .

[21]  Sam Kwong,et al.  Genetic algorithms: concepts and applications [in engineering design] , 1996, IEEE Trans. Ind. Electron..

[22]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[23]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[24]  Bruno Ciciani,et al.  Automatic Tuning of the Parallelism Degree in Hardware Transactional Memory , 2014, Euro-Par.

[25]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[26]  Torvald Riegel,et al.  Dynamic performance tuning of word-based software transactional memory , 2008, PPoPP.

[27]  Mark Moir,et al.  PhTM: Phased Transactional Memory , 2007 .

[28]  Bruno Ciciani,et al.  Analytical/ML Mixed Approach for Concurrency Regulation in Software Transactional Memory , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[29]  Mikel Luján,et al.  Robust Adaptation to Available Parallelism in Transactional Memory Applications , 2011, Trans. High Perform. Embed. Archit. Compil..

[30]  Anne-Marie Kermarrec,et al.  ProteusTM: Abstraction Meets Performance in Transactional Memory , 2016, ASPLOS.

[31]  K. Vidyasankar,et al.  HParSTM: A Hierarchy-based STM Protocol for Supporting Nested Parallelism , 2011 .

[32]  Jan Vitek,et al.  STMBench7: a benchmark for software transactional memory , 2007, EuroSys '07.

[33]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[34]  João P. Cachopo,et al.  Practical Parallel Nesting for Software Transactional Memory , 2013, DISC.

[35]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[36]  Santosh Pande,et al.  F2C2-STM: Flux-Based Feedback-Driven Concurrency Control for STMs , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[37]  Pascal Felber,et al.  Identifying the Optimal Level of Parallelism in Transactional Memory Applications , 2013, NETYS.

[38]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[39]  Willy Zwaenepoel,et al.  An Analytical Model of Hardware Transactional Memory , 2017, 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).