A Hybrid Pipelined Architecture for High Performance Top-K Sorting on FPGA

We present a hybrid pipelined sorting architecture capable of finding and producing as its output the <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula> largest elements from an input sequence. The architecture consists of a bitonic sorter and <inline-formula> <tex-math notation="LaTeX">$L$ </tex-math></inline-formula> cascaded sorting units. The sorting unit is designed to output <inline-formula> <tex-math notation="LaTeX">$P$ </tex-math></inline-formula> elements during every cycle with the aim of increasing the throughput and lowering the latency. The function of the bitonic sorter is to generate a segmented ordered sequence. The sorting unit processes this sequence to identify and output the <inline-formula> <tex-math notation="LaTeX">$P$ </tex-math></inline-formula> largest elements. Hence, the <inline-formula> <tex-math notation="LaTeX">$K = PL$ </tex-math></inline-formula> largest elements are obtained after the segmented ordered sequence proceeds through <inline-formula> <tex-math notation="LaTeX">$L$ </tex-math></inline-formula> cascaded sorting units. Variable-length and continuous sequences are supported by the proposed sorting architecture. The results of the implementation show that the sorting architecture can achieve a throughput of 22.88 GB/s with <inline-formula> <tex-math notation="LaTeX">$P = 16$ </tex-math></inline-formula> on a state-of-the-art Field Programmable Gate Array (FPGA).

[1]  Viktor K. Prasanna,et al.  Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA , 2015, FPGA.

[2]  Gustavo Alonso,et al.  Sorting networks on FPGAs , 2012, The VLDB Journal.

[3]  Jim Tørresen,et al.  FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting , 2011, FPGA '11.

[4]  A. Grimshaw,et al.  High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing , 2011, Parallel Process. Lett..

[5]  Andrew A. Davidson,et al.  Efficient parallel merge sort for fixed and variable length keys , 2012, 2012 Innovative Parallel Computing (InPar).

[6]  Horácio C. Neto,et al.  Unbalanced FIFO sorting for FPGA-based systems , 2009, 2009 16th IEEE International Conference on Electronics, Circuits and Systems - (ICECS 2009).

[7]  N. Tsuda,et al.  A piepline sorting chip , 1987, 1987 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[8]  Bin-Da Liu,et al.  Design of a pipelined and expandable sorting architecture with simple control scheme , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[9]  Kunle Olukotun,et al.  Hardware acceleration of database operations , 2014, FPGA.

[10]  Gianluca Piccinini,et al.  A Parallel Radix-Sort-Based VLSI Architecture for Finding the First $W$ Maximum/Minimum Values , 2014, IEEE Transactions on Circuits and Systems II: Express Briefs.

[11]  Pradeep Dubey,et al.  Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..

[12]  Amin Farmahini Farahani,et al.  Modular high-throughput and low-latency sorting units for FPGAs in the Large Hadron Collider , 2011, 2011 IEEE 9th Symposium on Application Specific Processors (SASP).

[13]  Kenji Kise,et al.  High-Performance Hardware Merge Sorter , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[14]  Viktor K. Prasanna,et al.  A hybrid design for high performance large-scale sorting on FPGA , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[15]  In-Cheol Park,et al.  Efficient Sorting Architecture for Successive-Cancellation-List Decoding of Polar Codes , 2016, IEEE Transactions on Circuits and Systems II: Express Briefs.

[16]  Feng Yu,et al.  Modular Serial Pipelined Sorting Architecture for Continuous Variable-Length Sequences with a Very Simple Control Strategy , 2017, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[17]  Viktor K. Prasanna,et al.  Computer Generation of High Throughput and Memory Efficient Sorting Designs on FPGA , 2017, IEEE Transactions on Parallel and Distributed Systems.

[18]  Shengnan Dong,et al.  A Novel High-Speed Parallel Scheme for Data Sorting Algorithm Based on FPGA , 2009, 2009 2nd International Congress on Image and Signal Processing.

[19]  Koji Nakano,et al.  Optimal Parallel Hardware K-Sorter and Top K-Sorter, with FPGA Implementations , 2015, 2015 14th International Symposium on Parallel and Distributed Computing.