A comparison of three representative hardware sorting units

Sorting is an important operation for many embedded computing systems. Since sorting large datasets may slowdown the overall execution, schemes to speedup sorting operations are needed. Bearing in mind the hardware acceleration of sorting, we show in this paper an analysis and comparison among three hardware sorting units: sorting network, insertion sorting, and FIFO-based merge sorting. We focus on embedded computing systems implemented with FPGAs, which give us the flexibility to accommodate customized hardware sorting units. We also present a hardware/software solution for sorting data sets with size larger than the size of the sorting unit. This hardware/software solution achieves 20× overall speedup over a pure software implementation of the well-known quicksort algorithm.

[1]  Behrooz Parhami,et al.  Data-Driven Control Scheme for Linear Arrays: Application to a Stable Insertion Sorter , 1999, IEEE Trans. Parallel Distributed Syst..

[2]  Robert Sedgewick,et al.  Algorithms in C - parts 1-4: fundamentals, data structures, sorting, searching (3. ed.) , 1997 .

[3]  Ezequiel Herruzo,et al.  A New Parallel Sorting Algorithm based on Odd-Even Mergesort , 2007, 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07).

[4]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[5]  Claudia Feregrino Uribe,et al.  Author ' s personal copy A versatile linear insertion sorter based on an FIFO scheme , 2009 .

[6]  Goetz Graefe,et al.  Implementing sorting in database systems , 2006, CSUR.

[7]  Toshio Nakatani,et al.  AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[8]  Maciej Wielgosz,et al.  FPGA implementation of the dynamic Huffman Encoder , 2006 .

[9]  Claudia Feregrino Uribe,et al.  A Versatile Linear Insertion Sorter Based on a FIFO Scheme , 2008, 2008 IEEE Computer Society Annual Symposium on VLSI.

[10]  Horácio C. Neto,et al.  Sorting Units for FPGA-Based Embedded Systems , 2008, DIPES.

[11]  Aishy Amer,et al.  An FPGA Architecture of Stable-Sorting on a Large Data Volume : Application to Video Signals , 2007, 2007 41st Annual Conference on Information Sciences and Systems.

[12]  Bernd Kleinjohann,et al.  Distributed Embedded Systems: Design, Middleware and Resources IFIP 20th World Computer Congress, TC 10 Working Conference on Distributed and Parallel ... Federation for Information Processing) , 2008 .

[13]  José Francisco Martínez Trinidad,et al.  An FPGA-based parallel sorting architecture for the Burrows Wheeler transform , 2005, 2005 International Conference on Reconfigurable Computing and FPGAs (ReConFig'05).

[14]  Masato Edahiro,et al.  Parallelizing fundamental algorithms such as sorting on multi-core processors for EDA acceleration , 2009, 2009 Asia and South Pacific Design Automation Conference.

[15]  Clyde C. W. Robson,et al.  A high speed data acquisition collector for merging and sorting data , 2008, 2008 IEEE Nuclear Science Symposium Conference Record.

[16]  Si-Qing Zheng,et al.  An Efficient Parallel VLSI Sorting Architecture , 2000, VLSI Design.

[17]  Lukás Sekanina,et al.  Evolutionary Design Space Exploration for Median Circuits , 2004, EvoWorkshops.

[18]  Christoforos E. Kozyrakis,et al.  JouleSort: a balanced energy-efficiency benchmark , 2007, SIGMOD '07.