Design and Implementation of 2D IDCT/IDST-Specific Accelerator on Heterogeneous Multicore Architecture

The paper talks about how to implement different sizes of Inverse Discrete Cosine Transform (IDCT) as well as Inverse Discrete Sine transform (IDST) that are dedicated on High Efficiency Video Coding (HEVC) standard through employing Coarse-Grained Reconfigurable Arrays (CGRAs) as a template-based accelerators on Heterogeneous Accelerator-Rich Platform (HARP). The proposal designs multi-purpose IDCT/IDST-based accelerators in a manner that the final architecture is made up of 4-point IDST and 4/8-point IDCT. The designing of the accelerators is done by creating template-based CGRA devices at various dimensions after which they are arranged in a sequential manner over a structure that is Network-on-Chip(NoC) based accompanied by a number of RISC cores. The research records the IDCT/IDST-specific accelerator performance, the entire platform’s performance, as well as the traffic of the NoC with regard to the total number of clock cycles made as well as several other high-level metrics of performance. The experiments that were conducted found that 4-point IDCT and 4-point IDST can be totally implemented in 56 clock cycles. For 8-point IDCT, the clock cycles required are 64. The total power dissipation, as well as energy consumption centred on information on routing and post placement, are all equal to 4.03 mW and 1.76 $\mu J$ for 4- point IDCT/IDST and 3.06 $\mu J$ for 8-point IDCT, respectively. Furthermore, the use of 256 instantiated Processing Elements (PEs) at an operating frequency of 200.0 MHz results to a 51.2 Giga Operations Per Second (GOPS) performance and 0.012 GOPS/mW architectural constant for the HARP model on the 28 nm Altera Stratix-V chip. The architecture under the proposal is capable of fully sustaining a format of Full HD 1080P at 30 fps on FPGA.

[1]  Bruno Zatt,et al.  Hardware design of fast HEVC 2-D IDCT targeting real-time UHD 4K applications , 2015, 2015 IEEE 6th Latin American Symposium on Circuits & Systems (LASCAS).

[2]  Jari Nurmi,et al.  A dedicated DMA logic addressing a time multiplexed memory to reduce the effects of the system bus bottleneck , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[3]  T. Ahonen,et al.  Hierarchically Heterogeneous Network-on-Chip , 2007, EUROCON 2007 - The International Conference on "Computer as a Tool".

[4]  He Weifeng,et al.  A cost effective 2-D adaptive block size IDCT architecture for HEVC standard , 2013, 2013 IEEE 56th International Midwest Symposium on Circuits and Systems (MWSCAS).

[5]  Timo Hämäläinen,et al.  High-level synthesized 2-D IDCT/IDST implementation for HEVC codecs on FPGA , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[6]  Bruno Zatt,et al.  Power efficient and high troughtput multi-size IDCT targeting UHD HEVC decoders , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[7]  Jari Nurmi,et al.  General-Purpose Embedded Processor Cores – The COFFEE RISC Example , 2007 .

[8]  Waqar Hussain,et al.  Errata to “Evaluation of a Heterogeneous Multicore Architecture by Design and Test of an OFDM Receiver” , 2018, IEEE Transactions on Parallel and Distributed Systems.

[9]  Jari Nurmi,et al.  Power mitigation of a heterogeneous multicore architecture by frequency scaling in an OFDM receiver test case , 2017, 2017 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC).