Constructing Concurrent Data Structures on FPGA with Channels

The performance of High-Level Synthesis (HLS) applications with irregular data structures is limited by its imperative programming paradigm like C/C++. In this paper, we show that constructing concurrent data structures with channels, a programming construct derived from CSP (communicating sequential processes) paradigm, is an effective approach to improve the performance of these applications. We evaluate concurrent data structure for FPGA by synthesizing a K-means clustering algorithm on the Intel HARP2 platform. A fully pipelined KMC processing element can be synthesized from OpenCL with the help of a SPSC (single-producer-single-consumer) queue and stack built from channels, achieving 15.2x speedup over a sequential baseline. The number of processing element can be scaled up by leveraging a MPMC (multiple-producer-multiple-consumer) stack with work distribution for dynamic load balance. Evaluation shows that an additional 3.5x speedup can be achieved when 4 processing element is instantiated. These results show that the concurrent data structure built with channels has great potential for improving the parallelism of HLS applications. We hope that our study will stimulate further research into the potential of channel-based HLS.

[1]  M. V. Valkenburg Network Analysis , 1964 .

[2]  Jehee Lee,et al.  Simulating biped behaviors from human motion data , 2007, ACM Trans. Graph..

[3]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[4]  Ulrik Brandes,et al.  Network Analysis: Methodological Foundations (Lecture Notes in Computer Science) , 2005 .

[5]  George A. Constantinides,et al.  High-level synthesis of dynamic data structures: A case study using Vivado HLS , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[6]  Scott Hauck,et al.  Impulse C vs. VHDL for Accelerating Tomographic Reconstruction , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[7]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[8]  David Pellerin,et al.  Practical FPGA programming in C , 2005 .

[9]  Brian C. Lovell,et al.  The Multiscale Classifier , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Vance Faber,et al.  Clustering and the continuous k-means algorithm , 1994 .

[11]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[12]  George A. Constantinides,et al.  A Case for Work-stealing on FPGAs with OpenCL Atomics , 2016, FPGA.

[13]  Andrew William Roscoe,et al.  The Theory and Practice of Concurrency , 1997 .

[14]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[15]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[17]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.