Exploring the Performance and Programmability Design Space of Hardware Transactional Memory

In this paper, we study the programmability and performance design space of the new hardware transactional memory (HTM) framework provided by Intel’s Haswell architecture. Towards this, we first present an Intel TSX performance characterization using a simple array access microbenchmark. Through a comprehensive study we identify several important trends, such as, the relationships between, transaction size, write ratio inside transactions, retry count, and transaction abort rate and performance. Next, we explore code transformations such as, computation splitting and privatization, for optimizing the performance of Moldyn, a molecular dynamics simulation from the CHARMM [4] molecular dynamics simulation and analysis package. We leverage our TSX performance characterization to guide and minimize our parametrization efforts for our Moldyn code transformations. We found that a hardware TM solution using computation splitting and privatization can be both easier to program and also outperform a hand-tuned fine-grain pthread locks solution including those same optimizations.

[1]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[2]  Kunle Olukotun,et al.  Characterization of TCC on chip-multiprocessors , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[3]  Nir Shavit,et al.  Transactional Locking II , 2006, DISC.

[4]  Mark Moir,et al.  Hybrid transactional memory , 2006, ASPLOS XII.

[5]  Kunle Olukotun,et al.  A Scalable, Non-blocking Approach to Transactional Memory , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[6]  Donald E. Porter,et al.  TxLinux: using and managing hardware transactional memory in an operating system , 2007, SOSP.

[7]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[8]  Torvald Riegel,et al.  Dynamic performance tuning of word-based software transactional memory , 2008, PPoPP.

[9]  Maged M. Michael,et al.  RingSTM: scalable transactions with a single atomic instruction , 2008, SPAA '08.

[10]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[11]  Rachid Guerraoui,et al.  Stretching transactional memory , 2009, PLDI '09.

[12]  Mihai Burcea,et al.  Transactional memory support for scalable and transparent parallelization of multiplayer games , 2010, EuroSys '10.

[13]  Jason Helge Anderson,et al.  Parallelizing FPGA placement using Transactional Memory , 2010, 2010 International Conference on Field-Programmable Technology.

[14]  Daniel Lupei A Study of Conflict Detection in Software Transactional Memory , 2010 .

[15]  Maged M. Michael,et al.  Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Michael Gschwind,et al.  The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.

[17]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).