Low-Cost Sorting Network Circuits Using Unary Processing

Sorting is a common task in a wide range of applications from signal and image processing to switching systems. For applications that require high performance, sorting is often performed in hardware with application-specified integrated circuits or field-programmable gate arrays. Hardware cost and power consumption are the dominant concerns. The usual approach is to wire up a network of compare-and-swap units in a configuration called the Batcher (or bitonic) network. Such networks can readily be pipelined. This paper proposes a novel area-efficient and power-efficient approach to sorting networks, based on “unary processing.” In unary processing, numbers are encoded uniformly by a sequence of one value (say 1) followed by a sequence of the other value (say 0) in a stream of 0’s and 1’s with the value defined by the fraction of 1’s in the stream. Synthesis results of complete sorting networks show up to 92% area and power saving compared to the conventional binary implementations. However, the latency increases. To mitigate the increased latency, this paper uses a novel time-encoding of data. The approach is validated with two implementations of an important application of sorting: median filtering. The result is a low cost, energy-efficient implementation of median filtering with only a slight accuracy loss, compared to conventional implementations.

[1]  Weikang Qian,et al.  The synthesis of robust polynomial arithmetic with stochastic logic , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[2]  Jesse Scott ANALYSIS OF TWO-DIMENSIONAL MEDIAN FILTER HARDWARE IMPLEMENTATIONS FOR REAL-TIME VIDEO DENOISING , 2010 .

[3]  Viktor K. Prasanna,et al.  Computer Generation of High Throughput and Memory Efficient Sorting Designs on FPGA , 2017, IEEE Transactions on Parallel and Distributed Systems.

[4]  C.-I.H. Chen,et al.  Chip design for monobit receiver , 1997 .

[5]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[6]  Marc D. Riedel,et al.  A deterministic approach to stochastic computation , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7]  Aishy Amer,et al.  An FPGA Architecture of Stable-Sorting on a Large Data Volume : Application to Video Signals , 2007, 2007 41st Annual Conference on Information Sciences and Systems.

[8]  Kia Bazargan,et al.  Power and Area Efficient Sorting Networks Using Unary Processing , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[9]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[10]  Simha Sethumadhavan,et al.  Energy-Efficient Hybrid Analog/Digital Approximate Computation in Continuous Time , 2016, IEEE Journal of Solid-State Circuits.

[11]  Brian R. Gaines,et al.  Stochastic Computing Systems , 1969 .

[12]  John P. Hayes,et al.  Stochastic circuits for real-time image-processing applications , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  Kia Bazargan,et al.  Computation on Stochastic Bit Streams Digital Image Processing Case Studies , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  Philip S. Yu,et al.  CellSort: High Performance Sorting on the Cell Processor , 2007, VLDB.

[15]  Vaibhav Garg,et al.  Time‐mode circuits for analog computation , 2009, Int. J. Circuit Theory Appl..

[16]  Chaitali Chakrabarti,et al.  Novel sorting network-based architectures for rank order filters , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[17]  Jan-Erik Eklund,et al.  VLSI implementation of a focal plane image processor-a realization of the near-sensor image processing concept , 1996, IEEE Trans. Very Large Scale Integr. Syst..

[18]  Xin Li,et al.  An Architecture for Fault-Tolerant Computation with Stochastic Logic , 2011, IEEE Transactions on Computers.

[19]  John P. Hayes,et al.  Survey of Stochastic Computing , 2013, TECS.

[20]  Takeo Kanade,et al.  A VLSI sorting image sensor: global massively parallel intensity-to-time processing for low-latency adaptive vision , 1999, IEEE Trans. Robotics Autom..

[21]  D. L. Tao,et al.  Design, analysis, and evaluation of concurrent checking sorting networks , 1997, IEEE Trans. VLSI Syst..

[22]  Howard C. Card,et al.  Stochastic Neural Computation I: Computational Elements , 2001, IEEE Trans. Computers.

[23]  Goetz Graefe,et al.  Implementing sorting in database systems , 2006, CSUR.

[24]  Basel A. Mahafzah,et al.  Bitonic sort on a chained-cubic tree interconnection network , 2014, J. Parallel Distributed Comput..

[25]  Reza Sameni,et al.  High-Speed Hardware Implementation of Fixed and Runtime Variable Window Length 1-D Median Filters , 2016, IEEE Transactions on Circuits and Systems II: Express Briefs.

[26]  M. Hassan Najafi,et al.  A Fast Fault-Tolerant Architecture for Sauvola Local Image Thresholding Algorithm Using Stochastic Computing , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[27]  Enzo Mumolo,et al.  A novel sorting algorithm and its application to a gamma-ray telescope asynchronous data acquisition system , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[28]  David J. Lilja,et al.  High-speed stochastic circuits using synchronous analog pulses , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[29]  Shie Mannor,et al.  A Min-Sum Iterative Decoder Based on Pulsewidth Message Encoding , 2010, IEEE Transactions on Circuits and Systems II: Express Briefs.

[30]  Dana S. Richards,et al.  VLSI median filters , 1990, IEEE Trans. Acoust. Speech Signal Process..

[31]  Sy-Yen Kuo,et al.  Design and analysis of defect tolerant hierarchical sorting networks , 1992, [1992] Proceedings International Conference on Wafer Scale Integration.

[32]  Levent Onural,et al.  Design and implementation of a general-purpose median filter unit in CMOS VLSI , 1990 .

[33]  David J. Lilja,et al.  Using Stochastic Computing to Reduce the Hardware Requirements for a Restricted Boltzmann Machine Classifier , 2016, FPGA.

[34]  Chaitali Chakrabarti Sorting network based architectures for median filters , 1993 .

[35]  W. J. Poppelbaum,et al.  Stochastic computing elements and systems , 1967, AFIPS '67 (Fall).

[36]  Kia Bazargan,et al.  Time-Encoded Values for Highly Efficient Stochastic Circuits , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[37]  Amin Farmahini Farahani,et al.  Modular Design of High-Throughput, Low-Latency Sorting Units , 2013, IEEE Transactions on Computers.

[38]  Hui Zhang,et al.  Implementing scheduling algorithms in high-speed networks , 1999, IEEE J. Sel. Areas Commun..

[39]  Howard C. Card,et al.  Stochastic Neural Computation II: Soft Competitive Learning , 2001, IEEE Trans. Computers.

[40]  J. P. Agrawal,et al.  Arbitrary size bitonic (ASB) sorters and their applications in broadband ATM switching , 1996, Conference Proceedings of the 1996 IEEE Fifteenth Annual International Phoenix Conference on Computers and Communications.

[41]  Apostolos Dollas,et al.  Unary Processing , 1987, Adv. Comput..

[42]  Kia Bazargan,et al.  Polysynchronous Clocking: Exploiting the Skew Tolerance of Stochastic Circuits , 2017, IEEE Transactions on Computers.

[43]  Basel A. Mahafzah Performance assessment of multithreaded quicksort algorithm on simultaneous multithreaded architecture , 2013, The Journal of Supercomputing.

[44]  A. Cicuttin,et al.  SORTCHIP: a VLSI implementation of a hardware algorithm for continuous data sorting , 2003, IEEE J. Solid State Circuits.

[45]  John P. Hayes,et al.  Trading Accuracy for Energy in Stochastic Circuit Design , 2017, ACM J. Emerg. Technol. Comput. Syst..