Parallelism Efficiency in Convolutional Turbo Decoding

Parallel turbo decoding is becoming mandatory in order to achieve high throughput and to reduce latency, both crucial in emerging digital communication applications. This paper explores and analyzes parallelism techniques in convolutional turbo decoding with the BCJR algorithm. A three-level structured classification of parallelism techniques is proposed and discussed: BCJR metric level parallelism, BCJR-SISO decoder level parallelism, and Turbo-decoder level parallelism. The second level of this classification is thoroughly analyzed on the basis of parallelism efficiency criteria, since it offers the best tradeoff between achievable parallelism degree and area overhead. At this level, and for subblock parallelism, we illustrate how subblock initializations are more efficient with the message passing technique than with the acquisition approach. Besides, subblock parallelism becomes quite inefficient for high subblock parallelism degree. Conversely, component-decoder parallelism efficiency increases with subblock parallelism degree. This efficiency, moreover, depends on BCJR computation schemes and on propagation time. We show that component-decoder parallelism using shuffled decoding enables to maximize architecture efficiency and, hence, is well suited for hardware implementation of high throughput turbo decoder.

[1]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[2]  Francky Catthoor,et al.  Memory optimization of MAP turbo decoder algorithms , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[3]  Patrick Robertson,et al.  Optimal and sub-optimal maximum a posteriori algorithms suitable for turbo decoding , 1997, Eur. Trans. Telecommun..

[4]  Marc P. C. Fossorier,et al.  Shuffled iterative decoding , 2005, IEEE Transactions on Communications.

[5]  Amer Baghdadi,et al.  SPC05-3: On the Parallelism of Convolutional Turbo Decoding and Interleaving Interference , 2006, IEEE Globecom 2006.

[6]  Michel Jezequel,et al.  Towards an optimal parallel decoding of turbo codes , 2006 .

[7]  Naresh R. Shanbhag,et al.  VLSI architectures for SISO-APP decoders , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[8]  P. Glenn Gulak,et al.  VLSI architectures for the MAP algorithm , 2003, IEEE Trans. Commun..

[9]  A. Glavieux,et al.  Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[10]  M. Jezequel,et al.  Exploring Parallel Processing Levels for Convolutional Turbo Decoding , 2006, 2006 2nd International Conference on Information & Communication Technologies.

[11]  Stephen B. Wicker,et al.  Turbo Coding , 1998 .

[12]  Marc P. C. Fossorier,et al.  Replica shuffled iterative decoding , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[13]  Sergio Benedetto,et al.  Mapping interleaving laws to parallel turbo and LDPC decoder architectures , 2004, IEEE Transactions on Information Theory.

[14]  Sergio Benedetto,et al.  A soft-input soft-output maximum a posteriori (MAP) module to decode parallel and serial concatenated codes , 1996 .

[15]  Massimo Ruo Roch,et al.  VLSI architectures for turbo codes , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[16]  John Cocke,et al.  Optimal decoding of linear codes for minimizing symbol error rate (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[17]  Michel Jezequel,et al.  The Turbo Code Standard for DVB-RCS , 2004 .

[18]  Yeheskel Bar-Ness,et al.  A parallel MAP algorithm for low latency turbo decoding , 2002, IEEE Communications Letters.

[19]  Norbert Wehn,et al.  A Scalable System Architecture for High-Throughput Turbo-Decoders , 2005, J. VLSI Signal Process..