Accelerating Heap-Based Priority Queue in Image Coding Application Using Parallel Index-Aware Tree Access

We present a novel heap-based priority queue structure for hardware implementation which is employed by a wavelet-based image encoder. The architecture exploits efficient use of FPGA’s on-chip dual port memories in an adaptive manner. By using 2x clock speed we created 4 memory ports along with intelligent data concatenation of parents and children queue elements, as well as an index-aware system linked to each key in the queue. These innovations yielded in cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different computation phases of operations in a manner to optimally take advantage of memory access required by that phase. We designed this architecture to incorporate in our Adaptive Scanning of Wavelet Data (ASWD) module which reorganizes the wavelet coefficients into locally stationary sequences for a wavelet-based image encoder. We validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of priority queue. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz.

[1]  Bill Lin,et al.  Fast and scalable priority queue architecture for high-speed network switches , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[2]  Wojciech M. Zabołotny Dual port memory based Heapsort implementation for FPGA , 2011, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).

[3]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[4]  R. Marcelino,et al.  A comparison of three representative hardware sorting units , 2009, 2009 35th Annual Conference of IEEE Industrial Electronics.

[5]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[6]  Muneyoshi Suzuki,et al.  Concurrent Heap-Based Network Sort Engine - Toward Enabling Massive and High Speed Per-Flow Queuing , 2009, 2009 IEEE International Conference on Communications.

[7]  Bertrand Granado,et al.  FPGA implementation of Hierarchical Enumerative Coding for locally stationary image source , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[8]  J. Astola,et al.  Hierarchical enumerative coding of locally stationary binary data , 1999 .

[9]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.

[10]  Jerome M. Shapiro,et al.  Embedded image coding using zerotrees of wavelet coefficients , 1993, IEEE Trans. Signal Process..

[11]  Michael T. Orchard,et al.  Image coding based on a morphological representation of wavelet data , 1999, IEEE Trans. Image Process..

[12]  Suzuki Muneyoshi,et al.  Network Sort Engine Based on Concurrent Heap -- Toward Implementing Massive and Ultra High Speed Per-flow Queuing , 2006 .

[13]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .