Breaking the Synchronization Bottleneck with Reconfigurable Transactional Execution

The advent of FPGA-based hybrid architecture offers the opportunity of customizing memory subsystems to enhance the overall system performance. However, it is not straightforward to design efficient FPGA circuits for emerging FPGAs applications such as in-memory database and graph analytics, which heavily depend on concurrent data structures (CDS’). Highly dynamic behaviors of CDS’ have to be orchestrated by synchronization primitives for correct execution. These primitives induce overwhelming memory traffic for synchronizations on FPGAs. This paper proposes a novel method for systematically exploring and exploiting memory-level parallelism (MLP) of CDS by transactional execution on FPGAs. Inspired by the idea that semantics of transactions can be implemented in a more efficient and scalable manner on FPGAs than on CPUs, we propose a transaction-based reconfigurable runtime system for capturing MLP of CDS’. Experiments on linked-list and skip-list show our approach achieves 5.18x and 1.55x throughput improvement on average than lock-based FPGA implementations and optimized CDS algorithms on a state-of-the-art multi-core CPU respectively.