Accelerating Viterbi Algorithm using Custom Instruction Approach

In recent years, the decoding algorithms in communication networks are becoming increasingly complex aiming to achieve high reliability in correctly decoding received messages. These decoding algorithms involve computationally complex operations requiring high performance computing hardware, which are generally expensive. A cost-effective solution is to enhance the Instruction Set Architecture (ISA) of the processors by creating new custom instructions for the computational parts of the decoding algorithms. In this paper, we propose to utilize the custom instruction approach to efficiently implement the widely used Viterbi decoding algorithm by adding the assembly language instructions to the ISA of DLX, PicoJava II and NIOS II processors, which represent RISC, stack and FPGA-based soft-core processor architectures, respectively. By using the custom instruction approach, the execution time of the Viterbi algorithm is significantly improved by approximately 3 times for DLX and PicoJava II, and by 2 times for NIOS II.

[1]  Kuan Jen Lin,et al.  Implementation of trigonometric custom functions hardware on embedded processor , 2013, 2013 IEEE 2nd Global Conference on Consumer Electronics (GCCE).

[2]  Kevin Skadron,et al.  Accelerating Compute-Intensive Applications with GPUs and FPGAs , 2008, 2008 Symposium on Application Specific Processors.

[3]  G. David Forney,et al.  The Viterbi Algorithm: A Personal History , 2005, ArXiv.

[4]  Patrick Schaumont,et al.  Implementing virtual secure circuit using a custom-instruction approach , 2010, CASES '10.

[5]  O. Yamada,et al.  A digital modulation method for terrestrial digital TV broadcasting using trellis coded OFDM and its performance , 1992, [Conference Record] GLOBECOM '92 - Communications for Global Users: IEEE.

[6]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.

[7]  K. Cholan Design and implementation of low power high speed viterbi decoder , 2012 .

[8]  Danny Wilson An Efficient Viterbi Decoder Implementation for the ZSP 500 DSP Core , 2003 .

[9]  Dale Skrien CPU Sim 3.1: A tool for simulating computer architectures for computer organization classes , 2001, JERC.

[10]  Francisco Tirado,et al.  Acceleration of block-matching algorithms using a custom instruction-based paradigm on a Nios II microprocessor , 2013, EURASIP J. Adv. Signal Process..

[11]  Samir Palnitkar,et al.  Verilog HDL: a guide to digital design and synthesis , 1996 .

[12]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[13]  G. David Forney,et al.  Convolutional Codes II. Maximum-Likelihood Decoding , 1974, Inf. Control..

[14]  Guy Lemieux,et al.  Real-time object detection in software with custom vector instructions and algorithm changes , 2017, 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[15]  Umair Siddique,et al.  Efficient Implementation of Computationally Complex Algorithms: Custom Instruction Approach , 2013 .

[16]  Andrew S. Tanenbaum,et al.  Structured Computer Organization , 1976 .

[17]  Viktor K. Prasanna,et al.  Time and energy efficient Viterbi decoding using FPGAs , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18]  Mehdi Kamal,et al.  Implementation-aware selection of the custom instruction set for extensible processors , 2014, Microprocess. Microsystems.

[19]  Jr. G. Forney,et al.  Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.

[20]  Uwe Meyer-Baese,et al.  Custom instruction for NIOS II processor FFT implementation for image processing , 2016, Commercial + Scientific Sensing and Imaging.

[21]  Martin Bossert,et al.  Channel Coding for Telecommunications , 1999 .