An Extended Nonstrict Partially Ordered Set-Based Configurable Linear Sorter on FPGAs

Sorting is essential for many scientific and data processing problems. It is significant to improve the efficiency of sorting. Taking advantage of specialized hardware, parallel sorting, e.g., sorting networks and linear sorters, implements sorting in lower time complexity. However, most of them are designed based on the parallelization of algorithms, lacking consideration of specialized hardware structures. In this article, we propose an extended nonstrict partially ordered set-based configurable linear sorter on field-programmable gate arrays (FPGAs). First, we extend nonstrict partial order to the binary tuple and n-tuple nonstrict partial orders. Then, the linear sorting algorithm is defined based on them, with the consideration of hardware performance. It has 4N/n time complexity varying from 4 to 2 N as the tuple size varies. The number of comparisons reduces to N/2 in binary tuple-based sorting, which is half of the state-of-the-art insertion linear sorting. Finally, we implement the linear sorter on FPGAs. It consists of multiple customizable micro-cores, named sorting units (SUs). The SU packages the storage and comparison of the tuple. All the SUs are connected into a chain with simple communication, which makes the sorter fully configurable in length, bandwidth, and throughput. They also act the same in each clock cycle, so that the achieved frequency of the sorter improves. In our experiment, the sorter achieves at most 660-MHz frequency, 5.6 Gb/s throughput, and 87 times speed-up compared with the quick sort algorithm on general processors.

[1]  Kenji Kise,et al.  FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core Systems , 2015, 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip.

[2]  Chen-Yi Lee,et al.  A shift register architecture for high-speed data sorting , 1995, J. VLSI Signal Process..

[3]  Ryan Kastner,et al.  Parallel Programming for FPGAs , 2018, ArXiv.

[4]  Wojciech A. Trybulec Partially Ordered Sets , 1990 .

[5]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[6]  Dirk Koch,et al.  Large Utility Sorting on FPGAs , 2018, 2018 International Conference on Field-Programmable Technology (FPT).

[7]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[8]  Gustavo Alonso,et al.  Sorting networks on FPGAs , 2012, The VLDB Journal.

[9]  Chabane Djeraba,et al.  Partially Ordered Sets , 2014 .

[10]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[11]  Devi Prasad,et al.  Sorting networks on FPGA , 2011 .

[12]  Yen-Chun Lin On Balancing Sorting on a Linear Array , 1993, IEEE Trans. Parallel Distributed Syst..

[13]  Michael Ferdman,et al.  Sorting Large Data Sets with FPGA-Accelerated Samplesort , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[14]  Lin Yen-Chun,et al.  On balancing sorting on a linear array , 1993 .

[15]  Bin-Da Liu,et al.  Design of a pipelined and expandable sorting architecture with simple control scheme , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[16]  Leibo Liu,et al.  A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Kunle Olukotun,et al.  Hardware acceleration of database operations , 2014, FPGA.

[18]  Goetz Graefe,et al.  Implementing sorting in database systems , 2006, CSUR.

[19]  David L. Andrews,et al.  A Streaming High-Throughput Linear Sorter System with Contention Buffering , 2011, Int. J. Reconfigurable Comput..

[20]  Jim Tørresen,et al.  FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting , 2011, FPGA '11.

[21]  Jens Teubner,et al.  Data Processing on FPGAs , 2013, Proc. VLDB Endow..

[22]  Gunther Schmidt,et al.  Relational Mathematics , 2010, Encyclopedia of Mathematics and its Applications.

[23]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[24]  Dinesh Manocha,et al.  Fast BVH Construction on GPUs , 2009, Comput. Graph. Forum.

[25]  B. Schröder Ordered Sets: An Introduction , 2012 .

[26]  Robert Sedgewick,et al.  Implementing Quicksort programs , 1978, CACM.

[27]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[28]  Takeo Kanade,et al.  A sorting image sensor: an example of massively parallel intensity-to-time processing for low-latency computational sensors , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[29]  Claudia Feregrino Uribe,et al.  A Versatile Linear Insertion Sorter Based on a FIFO Scheme , 2008, 2008 IEEE Computer Society Annual Symposium on VLSI.

[30]  Ryan Kastner,et al.  Resolve: Generation of High-Performance Sorting Architectures from High-Level Synthesis , 2016, FPGA.

[31]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.