Improving performance of transactional memory through machine learning

Transactional memory (TM) is a programming paradigm that facilitates parallel programming for multi‐core processors. In the last few years, some chip manufacturers provided hardware support for TM to reduce runtime overhead of Software Transactional Memory (STM). In this work, we offer two optimization techniques for TMs. The first technique focuses on Restricted Transactional Memory (RTM) in Intel's Haswell processor and shows that while in some applications, RTM improves performance over STM, in some others, it falls behind STM. We exploit this variability and propose an adaptive technique that switches between RTM and STM, statically. The second technique focuses on the overhead of TM and enhances the speed of the adaptive system. In particular, we focus on the size of transactions and improve performance by changing the transaction size. Optimizing the transaction size manually is a time‐consuming process and requires significant software engineering effort. We use a combination of Linear Regression (LR) and decision tree to decide on the transaction size, automatically. We evaluate our optimization techniques using a set of benchmarks from NAS, DiscoPoP, and STAMP benchmark suites. Our experimental results reveal that our optimization techniques are able to improve the performance of TM programs by 9% and energy‐delay by 15%, on average.

[1]  Maged M. Michael,et al.  Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[3]  Hugh Garraway Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.

[4]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  James R. Larus,et al.  Software and the Concurrency Revolution , 2005, ACM Queue.

[6]  Yang Xiao,et al.  Improving Performance of Transactional Applications through Adaptive Transactional Memory , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[7]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[8]  Zhen Li,et al.  Discovery of Potential Parallelism in Sequential Programs , 2013, 2013 42nd International Conference on Parallel Processing.

[9]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[10]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[11]  D. Sudakin,et al.  Appendix A , 2007, Journal of agromedicine.

[12]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[13]  Jean-François Méhaut,et al.  Dynamic Thread Mapping Based on Machine Learning for Transactional Memory Applications , 2012, Euro-Par.

[14]  Sameer Kulkarni,et al.  Towards Applying Machine Learning to Adaptive Transactional Memory , 2011 .

[15]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[16]  Torvald Riegel,et al.  Dynamic performance tuning of word-based software transactional memory , 2008, PPoPP.

[17]  José Nelson Amaral,et al.  Multi-dimensional Evaluation of Haswell's Transactional Memory Performance , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.

[18]  A. Goldberger Best Linear Unbiased Prediction in the Generalized Linear Regression Model , 1962 .

[19]  Pascal Felber,et al.  Identifying the Optimal Level of Parallelism in Transactional Memory Applications , 2013, NETYS.

[20]  Mike Dai Wang Exploring the Performance and Programmability Design Space of Hardware Transactional Memory , 2014 .

[21]  Nir Shavit,et al.  Transactional Locking II , 2006, DISC.

[22]  Jean-François Méhaut,et al.  Adaptive thread mapping strategies for transactional memory applications , 2014, J. Parallel Distributed Comput..

[23]  Bratin Saha,et al.  McRT-STM: a high performance software transactional memory system for a multi-core runtime , 2006, PPoPP '06.

[24]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[25]  Timothy J. Slegel,et al.  Transactional Memory Architecture and Implementation for IBM System Z , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[26]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[27]  Yang Xiao,et al.  Automatic Optimization of Software Transactional Memory Through Linear Regression and Decision Tree , 2015, ICA3PP.