A Parallel Radix-Sort-Based VLSI Architecture for Finding the First $W$ Maximum/Minimum Values

Very-large-scale integration (VLSI) architectures for finding the first W (W>2) maximum (or minimum) values are required in the implementation of several applications such as nonbinary low-density-parity-check decoders, K-best multiple-input-multiple-output (MIMO) detectors, and turbo product codes. In this brief, a parallel radix-sort-based VLSI architecture for finding the first W maximum (or minimum) values is proposed. The described architecture, called Bit-Wise-And (BWA) architecture, relies on analyzing input data from the most significant bit to the least significant one, with very simple logic circuits. One key feature in the BWA architecture is its high level of scalability, which enables the adoption of this solution in a large spectrum of applications, corresponding to large ranges for both W and the size of the input data set. Experimental results, achieved by implementing the proposed architecture on a high-speed 90-nm CMOS standard-cell technology, show that BWA architecture requires significantly less area than other solutions available in the literature, i.e., less than or about 50% in all the considered cases and about 50% in the worst case. Moreover, the BWA architecture exhibits the lowest area-delay product among almost all considered cases.

[1]  Tae-Hwan Kim,et al.  Small-Area and Low-Energy $K$-Best MIMO Detector Using Relaxed Tree Expansion and Early Forwarding , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[2]  Luca Gaetano Amarù,et al.  High Speed Architectures for Finding the First two Maximum/Minimum Values , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Guido Masera,et al.  A 2.63 Mbit/s VLSI Implementation of SISO Arithmetic Decoders for High Performance Joint Source Channel Codes , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[4]  Hideki Imai,et al.  Reduced complexity iterative decoding of low-density parity check codes based on belief propagation , 1999, IEEE Trans. Commun..

[5]  Gianluca Piccinini,et al.  UDSM Trends Comparison: From Technology Roadmap to UltraSparc Niagara2 , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Keshab K. Parhi,et al.  Min-Sum Decoder Architectures With Reduced Word Length for LDPC Codes , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[7]  Ajay Dholakia,et al.  Reduced-complexity decoding of LDPC codes , 2005, IEEE Transactions on Communications.

[8]  P. Glenn Gulak,et al.  VLSI implementation of a WiMAX/LTE compliant low-complexity high-throughput soft-output K-Best MIMO detector , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[9]  Zhan Guo,et al.  Algorithm and implementation of the K-best sphere decoding for MIMO detection , 2006, IEEE Journal on Selected Areas in Communications.

[10]  V. E. Alekseev Sorting algorithms with minimum memory , 1969 .

[11]  Tae-Hwan Kim,et al.  Small-area and low-energy K-best MIMO detector using relaxed tree expansion and early forwarding , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[12]  Pei-Yun Tsai,et al.  A 4×4 64-QAM reduced-complexity K-best MIMO detector up to 1.5Gbps , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[13]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 3: Sorting and Searching , 1974 .

[14]  Chin-Long Wey,et al.  Algorithms of Finding the First Two Minimum Values and Their Hardware Implementation , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[15]  Gwan S. Choi,et al.  A Parallel VLSI Architecture for Layered Decoding , .

[16]  Guido Masera,et al.  Simplified Log-MAP Algorithm for Very Low-Complexity Turbo Decoder Hardware Architectures , 2014, IEEE Transactions on Instrumentation and Measurement.

[17]  P. Glenn Gulak,et al.  A 675 Mbps, 4 × 4 64-QAM K-Best MIMO Detector in 0.13 µm CMOS , 2012, IEEE Trans. Very Large Scale Integr. Syst..

[18]  Guido Masera,et al.  VLSI Implementation of a Multi-Mode Turbo/LDPC Decoder Architecture , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[19]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[20]  Christophe Jégo,et al.  A highly parallel Turbo Product Code decoder without interleaving resource , 2008, 2008 IEEE Workshop on Signal Processing Systems.

[21]  Joseph R. Cavallaro,et al.  Low-complexity and high-performance soft MIMO detection based on distributed M-algorithm through trellis-diagram , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Guido Masera,et al.  Non-recursive max* operator with reduced implementation complexity for turbo decoding , 2012, IET Commun..

[23]  Ramesh Pyndiah,et al.  Near-optimum decoding of product codes: block turbo codes , 1998, IEEE Trans. Commun..

[24]  David Declercq,et al.  Decoding Algorithms for Nonbinary LDPC Codes Over GF$(q)$ , 2007, IEEE Transactions on Communications.

[25]  Guido Masera,et al.  Efficient VLSI implementation of soft-input soft-output fixed-complexity sphere decoder , 2012, IET Commun..

[26]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[27]  Jean-Luc Danger,et al.  Lambda-Min Decoding Algorithm of Regular and Irregular LDPC Codes , 2003 .

[28]  Emmanuel Boutillon,et al.  Design of a GF(64)-LDPC Decoder Based on the EMS Algorithm , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[29]  David Declercq,et al.  Trellis-Based Extended Min-Sum Algorithm for Non-Binary LDPC Codes and its Hardware Structure , 2013, IEEE Transactions on Communications.

[30]  Emmanuel Boutillon,et al.  Bubble check: a simplified algorithm for elementary check node processing in extended min-sum non-binary LDPC decoders , 2010 .

[31]  Gwan S. Choi,et al.  A Parallel VLSI Architecture for Layered Decoding for Array LDPC Codes , 2007, 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07).