Implementing Signatures for Transactional Memory

Transactional Memory (TM) systems must track the read and write sets - items read and written during a transaction - to detect conflicts among concurrent transactions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a performance cost of false positives (conflicts detected when none exists). This paper examines different organizations to achieve hardware-efficient and accurate TM signatures. First, we find that implementing each signature with a single k-hash- function Bloom filter (True Bloom signature) is inefficient, as it requires multi-ported SRAMs. Instead, we advocate using k single-hash-function Bloom filters in parallel (Parallel Bloom signature), using area-efficient single-ported SRAMs. Our formal analysis shows that both organizations perform equally well in theory and our simulation- based evaluation shows this to hold approximately in practice. We also show that by choosing high-quality hash functions we can achieve signature designs noticeably more accurate than the previously proposed implementations. Finally, we adapt Pagh and Rodler's cuckoo hashing to implement Cuckoo-Bloom signatures. While this representation does not support set intersection, it mitigates false positives for the common case of small read/write sets and performs like a Bloom filter for large sets.

[1]  Simha Sethumadhavan,et al.  Scalable hardware memory disambiguation for high-ILP processors , 2003, IEEE Micro.

[2]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[3]  Andreas Veneris,et al.  L-CBF: A Low-Power, Fast Counting Bloom Filter Architecture , 2008, IEEE Trans. Very Large Scale Integr. Syst..

[4]  George Varghese,et al.  An Improved Construction for Counting Bloom Filters , 2006, ESA.

[5]  Koen De Bosschere,et al.  XOR-based hash functions , 2005, IEEE Transactions on Computers.

[6]  Josep Torrellas,et al.  BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.

[7]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[8]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[9]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[10]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[11]  Hsien-Hsin S. Lee,et al.  Efficient System-on-Chip Energy Management with a Segmented Bloom Filter , 2006, ARCS.

[12]  M. V. Ramakrishna,et al.  Practical performance of Bloom filters and parallel free-text searching , 1989, CACM.

[13]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[14]  Kunle Olukotun,et al.  An effective hybrid transactional memory system with strong isolation guarantees , 2007, ISCA '07.

[15]  James R. Larus,et al.  Transactional Memory , 2006, Transactional Memory.

[16]  M. V. Ramakrishna,et al.  Efficient Hardware Hashing Functions for High Performance Computers , 1997, IEEE Trans. Computers.

[17]  S. Srinivasa Rao,et al.  An optimal Bloom filter replacement , 2005, SODA '05.

[18]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[19]  Larry Carter,et al.  Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.

[20]  Daniel Sanchez Design and Implementation of Signatures for Transactional Memory Systems , 2007 .

[21]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[22]  Josep Torrellas,et al.  Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[23]  Larry Carter,et al.  Exact and approximate membership testers , 1978, STOC.

[24]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[25]  David A. Wood,et al.  Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[26]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[27]  Wu-Shiung Feng,et al.  New efficient designs for XOR and XNOR functions on the transistor level , 1994, IEEE J. Solid State Circuits.

[28]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[29]  Maurice Herlihy,et al.  Virtualizing transactional memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[30]  Craig B. Zilles,et al.  Transactional memory and the birthday paradox , 2007, SPAA '07.

[31]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.