Parallelizing Data Processing on FPGAs with Shifter Lists

Parallelism is currently seen as a mechanism to minimize the impact of the power and heat dissipation problems encountered in modern hardware. Data parallelism—based on partitioning the data—and pipeline parallelism—based on partitioning the computation—are the two main approaches to leverage parallelism on a wide range of hardware platforms. Unfortunately, not all data processing problems are susceptible to either of those strategies. An example is the skyline operator [Börzsönyi et al. 2001], which computes the set of Pareto-optimal points within a multidimensional dataset. Existing approaches to parallelize the skyline operator are based on data parallelism. As a result, they suffer from a high overhead when merging intermediate results because of the lack of a global view of the problem inherent to partitioning the input data. In this article, we show how to combine pipeline with data parallelism on a Field-Programmable Gate Array (FPGA) for a more efficient utilization of the available hardware parallelism. As we show in our experiments, skyline computation using our proposed technique scales linearly with the number of processing elements, and the performance we achieve on a rather small FPGA is comparable to that of a 64-core high-end server running a state-of-the-art data parallel implementation of skyline [Park et al. 2009]. The proposed approach to parallelize the skyline operator can be generalized to a wider range of data processing problems. We demonstrate this through a novel, highly parallel data structure, a shifter list, that can be efficiently implemented on an FPGA. The resulting template is easy to parametrize to implement a variety of computationally intensive operators such as frequent items, n-closest pairs, or K-means.

[1]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[2]  Vassilis J. Tsotras,et al.  Massively parallel XML twig filtering using dynamic programming on FPGAs , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[3]  Jonghyun Park,et al.  Parallel Skyline Computation on Multicore Architectures , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[4]  Jim Tørresen,et al.  FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting , 2011, FPGA '11.

[5]  Takashi Takenaka,et al.  20Gbps C-Based Complex Event Processing , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[6]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[7]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[8]  Satnam Singh Computing without processors , 2012, CODES+ISSS '12.

[9]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[10]  Ken Eguro,et al.  SIRC: An Extensible Reconfigurable Computing Communication API , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[11]  James C. Hoe,et al.  CoRAM: an in-fabric memory architecture for FPGA-based computing , 2011, FPGA '11.

[12]  Parthasarathy Ranganathan,et al.  From Microprocessors to Nanostores: Rethinking Data-Centric Systems , 2011, Computer.

[13]  Jürgen Teich,et al.  On-the-fly Composition of FPGA-Based SQL Query Accelerators Using a Partially Reconfigurable Module Library , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[14]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[15]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[16]  Scott A. Mahlke,et al.  Optimus: efficient realization of streaming applications on FPGAs , 2008, CASES '08.

[17]  Seung-won Hwang,et al.  VSkyline: vectorization for efficient skyline computation , 2010, SGMD.

[18]  Riccardo Torlone,et al.  Which are my preferred items , 2002 .

[19]  Bharat Sukhwani,et al.  Database analytics acceleration using FPGAs , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[21]  Ray Bittner,et al.  The Speedy DDR2 Controller For FPGAs , 2009, ERSA.

[22]  Gustavo Alonso,et al.  FPGA acceleration for the frequent item problem , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).