Notary: Hardware techniques to enhance signatures

Hardware signatures have been recently proposed as an efficient mechanism to detect conflicts amongst concurrently running transactions in transactional memory systems (e.g., bulk, LogTM-SE, and SigTM). Signatures use fixed hardware to represent an unbounded number of addresses, but may lead to false conflicts (detecting a conflict when none exists). Previous work recommends that signatures be implemented with parallel Bloom filters with two or four hash functions (e.g., H3). Two problems exist with current signature designs. First, H3 implementations use many XOR gates. This increases hardware area and power overheads. Second, signature false positives can result from conflicts with signature bits set by private memory addresses that do not require isolation. This paper develops Notary, a coupling of two signature enhancements to ameliorate these problems. First, we use address entropy analysis to develop page-block-XOR (PBX) hashing and show it performs similar to H3 at lower hardware cost. Second, we introduce a privatization interface that explicitly allows the programmer to declare shared and private heap memory allocation. Privatization reduces false conflicts arising from private memory accesses and can lead to a reduction in the signature size used. Results from custom transistor-level layouts of H3 and PBX, along with full-system simulation of a 16-core chip-multiprocessor implementing LogTM-SE, show (a) PBX hashing performs similar to H3 hashing while requiring up to 24% less area and 4.7% less power overhead and (b) privatization can improve execution time by up to 86% (by reducing false conflicts by up to 96%).

[1]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[2]  Paul Vixie,et al.  Implementation and Evaluation of Moderate Parallelism in the BIND9 DNS Server , 2006, USENIX Annual Technical Conference, General Track.

[3]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[4]  David P. Reed,et al.  Implementing atomic actions on decentralized data , 1983, TOCS.

[5]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, 2008 International Symposium on Computer Architecture.

[6]  James R. Larus,et al.  Transactional Memory , 2006, Transactional Memory.

[7]  Donald E. Porter,et al.  TxLinux: using and managing hardware transactional memory in an operating system , 2007, SOSP.

[8]  Kunle Olukotun,et al.  TAPE: a transactional application profiling environment , 2005, ICS '05.

[9]  Larry Carter,et al.  Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.

[10]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2007, PLDI '07.

[11]  Edward S. Davidson,et al.  Information content of CPU memory referencing behavior , 1977, ISCA '77.

[12]  Josep Torrellas,et al.  SoftSig: Software-Exposed Hardware Signatures for Code Analysis and Optimization , 2009, IEEE Micro.

[13]  Josep Torrellas,et al.  Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[14]  Kunle Olukotun,et al.  An effective hybrid transactional memory system with strong isolation guarantees , 2007, ISCA '07.

[15]  Michael F. Spear,et al.  Delaunay Triangulation with Transactions and Barriers , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[16]  Donald E. Porter,et al.  MetaTM/TxLinux: Transactional Memory for an Operating System , 2008, IEEE Micro.

[17]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[18]  Josep Torrellas,et al.  Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors , 2005, TACO.

[19]  Josep Torrellas,et al.  BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.

[20]  Larry Rudolph,et al.  Creating a wider bus using caching techniques , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[21]  Nir Shavit,et al.  Dynamic Identification of Shared Transactional Locations , 2008 .

[22]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[23]  Koen De Bosschere,et al.  XOR-based hash functions , 2005, IEEE Transactions on Computers.

[24]  Kunle Olukotun,et al.  Tradeoffs in transactional memory virtualization , 2006, ASPLOS XII.

[25]  Michael F. Spear,et al.  Privatization techniques for software transactional memory , 2007, PODC '07.

[26]  Zhao Zhang,et al.  A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.

[27]  Derek Hower,et al.  Rerun: Exploiting Episodes for Lightweight Memory Race Recording , 2008, 2008 International Symposium on Computer Architecture.

[28]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[29]  Jaejin Lee,et al.  Eliminating conflict misses using prime number-based cache indexing , 2005, IEEE Transactions on Computers.

[30]  William N. Scherer,et al.  Advanced contention management for dynamic software transactional memory , 2005, PODC '05.

[31]  Gabriel H. Loh,et al.  Entropy-based low power data TLB design , 2006, CASES '06.

[32]  Satish Narayanasamy,et al.  Unbounded page-based transactional memory , 2006, ASPLOS XII.

[33]  Wu-Shiung Feng,et al.  New efficient designs for XOR and XNOR functions on the transistor level , 1994, IEEE J. Solid State Circuits.

[34]  Maurice Herlihy,et al.  Transactional boosting: a methodology for highly-concurrent transactional objects , 2008, PPoPP.

[35]  Håkan Grahn,et al.  Transactional memory , 2010, J. Parallel Distributed Comput..

[36]  Mateo Valero,et al.  Eliminating cache conflict misses through XOR-based placement functions , 1997, ICS '97.

[37]  Arvin Park,et al.  An analysis of the information content of address reference streams , 1991, MICRO 24.

[38]  Tim Harris,et al.  Abstract Nested Transactions , 2007 .

[39]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[40]  Jaejin Lee,et al.  Using prime numbers for cache indexing to eliminate conflict misses , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[41]  B. Ramakrishna Rau,et al.  Pseudo-randomly interleaved memory , 1991, ISCA '91.

[42]  Michael F. Spear,et al.  An integrated hardware-software approach to flexible transactional memory , 2007, ISCA '07.

[43]  Daniel Sánchez,et al.  Implementing Signatures for Transactional Memory , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[44]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[45]  Kunle Olukotun,et al.  The OpenTM Transactional Application Programming Interface , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[46]  Arvin Park,et al.  Address compression through base register caching , 1990, MICRO.

[47]  Krste Asanovic,et al.  Mondrian memory protection , 2002, ASPLOS X.

[48]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[49]  Eduard Ayguadé,et al.  Transactional Memory and OpenMP , 2007, IWOMP.

[50]  Eric M. Schwarz,et al.  IBM POWER6 microarchitecture , 2007, IBM J. Res. Dev..

[51]  Min Xu,et al.  Evaluating Non-deterministic Multi-threaded Commercial Workloads , 2001 .

[52]  Bratin Saha,et al.  Open nesting in software transactional memory , 2007, PPOPP.

[53]  Milo M. K. Martin,et al.  Making the fast case common and the uncommon case simple in unbounded transactional memory , 2007, ISCA '07.

[54]  Mark D. Hill,et al.  Signatures in transactional memory systems , 2009 .

[55]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[56]  Brandon Lucia,et al.  Atom-Aid: Detecting and Surviving Atomicity Violations , 2009, IEEE Micro.

[57]  M. V. Ramakrishna,et al.  Efficient Hardware Hashing Functions for High Performance Computers , 1997, IEEE Trans. Computers.

[58]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, International Symposium on Computer Architecture.

[59]  Michael Isard,et al.  Dynamic Separation for Transactional Memory , 2008 .

[60]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[61]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.