The Influence of Malloc Placement on TSX Hardware Transactional Memory

In this paper, we demonstrate the impact of the placement policies of memory allocators on the performance of applications that use hardware transactional memory. In particular, commonly used allocators such as the default GNUglib malloc allocator may place objects in such a way that causes hardware transactions to consistently abort, even when running single-threaded. In multithreaded applications, these consistent aborts can force applications to fall back to using locks, significantly limiting the parallelism. We also show that using index-aware allocators can avoid these pathological memory placements. We have observed read-only transactions commit where the ca che footprint exceeds the L2 size, but have never observed tr ansactions commit where the footprint is above the size of th e L3 with TSX-RTM, misses turn into aborts aborts : amplify impac t of misses; wasted cycles; loss of write-set from L1 aborts a lso force slow-path and loss of concurrent execution for TLE Thread-local problem shifts to global impediment Misses be come aborts become serialization

[1]  Carsten Willems,et al.  Practical Timing Side Channel Attacks against Kernel Space ASLR , 2013, 2013 IEEE Symposium on Security and Privacy.

[2]  Fred R. M. Barnes,et al.  An Evaluation of Intel's Restricted Transactional Memory for CPAs , 2013, CPA.

[3]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[4]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[5]  Mateo Valero,et al.  Eliminating cache conflict misses through XOR-based placement functions , 1997, ICS '97.

[6]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[7]  Christoforos E. Kozyrakis,et al.  The ZCache: Decoupling Ways and Associativity , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[8]  David Dice,et al.  Cache index-aware memory allocation , 2011, ISMM '11.

[9]  Mark Moir,et al.  Pitfalls of lazy subscription , 2014 .

[10]  Mark Moir,et al.  Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.

[11]  Brian N. Bershad,et al.  Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware , 1994, OSDI.

[12]  Ruby B. Lee,et al.  A novel cache architecture with enhanced performance and security , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[13]  Guido Araujo,et al.  Performance implications of dynamic memory allocators on transactional memory systems , 2015, PPOPP.