Reducing misses to external memory accesses in task-level pipelining

Recently, researchers have shown an increased interest in using task-level pipelining to accelerate the overall execution of applications mainly consisting of producer-consumer tasks. This paper proposes optimization techniques for enhancing our approach to pipeline the execution of producer-consumer tasks in FPGA-based multicore architectures with reductions in the number of accesses to external memory. Our approach is able to speedup the overall execution of successive, data-dependent tasks, by using multiple cores and specific customization features provided by FPGAs. We evaluate the impact in the performance of task-level pipelining when using different hash functions and optimization schemes in the inter stage buffer (ISB). The optimizations proposed in this paper were evaluated with FPGA implementations. The experimental results show the efficiency of a simple scheme to reduce external memory accesses and the suitability of the hash function being used. Furthermore, the results reveal noticeable performance improvements for the set of benchmarks being used.

[1]  Guy Even,et al.  An FPGA implementation of pipelined multiplicative division with IEEE Rounding , 2007 .

[2]  Robert Sedgewick,et al.  Algorithms in C , 1990 .

[3]  Brian W. Kernighan,et al.  The C Programming Language , 1978 .

[5]  João M. P. Cardoso,et al.  Coarse/Fine-grained Approaches for Pipelining Computing Stages in FPGA-Based Multicore Architectures , 2014, Euro-Par Workshops.

[6]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[7]  Pedro C. Diniz,et al.  Coarse-grain pipelining on multiple FPGA architectures , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[8]  Donghyun Kim,et al.  Memory-centric network-on-chip for power efficient execution of task-level pipeline on a multi-core processor , 2009, IET Comput. Digit. Tech..

[9]  Donald E. Knuth,et al.  The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .

[10]  Pedro C. Diniz,et al.  A Data-Driven Approach for Pipelining Sequences of Data-Dependent Loops , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[11]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[12]  Jürgen Becker,et al.  An FPGA-based multi-core approach for pipelining computing stages , 2013, SAC '13.

[13]  W. Marsden I and J , 2012 .

[14]  Timothy L. Harris,et al.  Non-blocking Hashtables with Open Addressing , 2005, DISC.