Asymmetric Allocation in a Shared Flexible Signature Module for Multicore Processors

Hardware signatures based on Bloom filters are used to support and accelerate membership query in a set of items. They use modest hardware at the cost of false positives, but never produce false negatives. Signatures were traditionally used in different distributed and network applications, but in recent years their use has been extended to other fields (for instance, support for manycore/multicore parallel programming, such as data race detection, deterministic replay or transactional memory (TM)). One drawback of signatures is that they have a fixed size, and what is a good signature size for one application, may be not appropriate for another. Recently, we proposed a shared hardware module for managing signatures based on a collection of Bloom filters. It has the characteristic of hosting a variable number of signatures that change their size in runtime to adapt to the demand of the applications. However, the assignment of resources follows a single symmetric policy for all allocations leading to a module with a limited adaptability to the workloads. In this paper, we explore new techniques to allocate signatures in an asymmetric way in this module, with the aim of optimizing the resources and reducing even more the number of false positives. We explore several asymmetric strategies and their efficient hardware implementation, and we show specific examples using TM as a driver application. The results show that these strategies lead to a significant reduction in the number of false positives compared with symmetric policies.

[1]  Michael Gschwind,et al.  The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.

[2]  Mark D. Hill,et al.  Signatures in transactional memory systems , 2009 .

[3]  Emilio L. Zapata,et al.  Hardware Signature Designs to Deal with Asymmetry in Transactional Data Sets , 2013, IEEE Transactions on Parallel and Distributed Systems.

[4]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[5]  Kenneth L. McMillan Relevance heuristics for program analysis , 2008, POPL '08.

[6]  David Hutchison,et al.  Scalable Bloom Filters , 2007, Inf. Process. Lett..

[7]  Hyesook Lim,et al.  On Adding Bloom Filters to Longest Prefix Matching Algorithms , 2014, IEEE Transactions on Computers.

[8]  David Eisenstat,et al.  Lowering the Overhead of Nonblocking Software Transactional Memory , 2006 .

[9]  Michael L. Scott,et al.  Flexible Decoupled Transactional Memory Support , 2008, 2008 International Symposium on Computer Architecture.

[10]  Jie Wu,et al.  The Dynamic Bloom Filters , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  Tsern-Huei Lee,et al.  Realizing a Sub-Linear Time String-Matching Algorithm With a Hardware Accelerator Using Bloom Filters , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Bruno Baynat,et al.  Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives , 2006, CoNEXT '06.

[13]  Yan Cui,et al.  Mitigating Resource Contention on Multicore Systems via Scheduling , 2014, Comput. J..

[14]  Rabi N. Mahapatra,et al.  A reconfigurable computing architecture for semantic information filtering , 2013, 2013 IEEE International Conference on Big Data.

[15]  Yan Cui,et al.  Lock-contention-aware scheduler: A scalable and energy-efficient method for addressing scalability collapse on multicore systems , 2013, TACO.

[16]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[17]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[18]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[19]  Stéphan Jourdan,et al.  Haswell: The Fourth-Generation Intel Core Processor , 2014, IEEE Micro.

[20]  Yuan-Cheng Lai,et al.  Hardware-Software Codesign for High-Speed Signature-based Virus Scanning , 2009, IEEE Micro.

[21]  Josep Torrellas,et al.  Pacman: Tolerating asymmetric data races with unintrusive hardware , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[22]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[23]  Kunle Olukotun,et al.  Hardware acceleration of transactional memory on commodity systems , 2011, ASPLOS XVI.

[24]  Oscar Plata,et al.  LS-Sig: Locality-Sensitive Signatures for Transactional Memory , 2013, IEEE Transactions on Computers.

[25]  Sasu Tarkoma,et al.  Theory and Practice of Bloom Filters for Distributed Systems , 2012, IEEE Communications Surveys & Tutorials.

[26]  Otto Carlos Muniz Bandeira Duarte,et al.  A Generalized Bloom Filter to Secure Distributed Network Applications , 2011, Comput. Networks.

[27]  Lin Peng,et al.  Conflict detection via adaptive signature for software transactional memory , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[28]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[29]  Josep Torrellas,et al.  SoftSig: Software-Exposed Hardware Signatures for Code Analysis and Optimization , 2009, IEEE Micro.

[30]  Maurice Herlihy,et al.  Invyswell: A hybrid transactional memory for Haswell's restricted transactional memory , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[31]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[32]  Bertil Schmidt,et al.  Reconfigurable Accelerator for the Word-Matching Stage of BLASTN , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[33]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[34]  Jeffrey T. Draper,et al.  Improving Utilization of Hardware Signatures in Transactional Memory , 2013, IEEE Transactions on Parallel and Distributed Systems.

[35]  Ryan Johnson,et al.  Decoupling contention management from scheduling , 2010, ASPLOS XV.

[36]  Gu-Yeon Wei,et al.  Toward a hardware accelerated future , 2013 .

[37]  Javier D. Bruguera,et al.  FlexSig: Implementing flexible hardware signatures , 2012, TACO.

[38]  Luca Benini,et al.  SoC-TM: Integrated HW/SW support for transactional memory programming on embedded MPSoCs , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[39]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[40]  Marco Ottavi,et al.  A Synergetic Use of Bloom Filters for Error Detection and Correction , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[41]  Kunle Olukotun,et al.  Eigenbench: A simple exploration tool for orthogonal TM characteristics , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[42]  Maged M. Michael,et al.  Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[43]  Sarita V. Adve,et al.  DeNovoND: efficient hardware support for disciplined non-determinism , 2013, ASPLOS '13.

[44]  Marco Ottavi,et al.  Error Detection and Correction in Content Addressable Memories by Using Bloom Filters , 2010, IEEE Transactions on Computers.

[45]  Kang Li,et al.  Approximate caches for packet classification , 2004, IEEE INFOCOM 2004.

[46]  Fan Deng,et al.  Approximately detecting duplicates for streaming data using stable bloom filters , 2006, SIGMOD Conference.