Lifting Scheme Cores for Wavelet Transform

The thesis focuses on efficient computation of the two-dimensional discrete wavelet transform. The state-of-the-art methods are extended in several ways to perform the transform in a single loop, possibly in multi-scale fashion, using a compact streaming core. This core can further be appropriately reorganized to target the minimization of certain platform resources. The approach presented here nicely fits into common SIMD extensions, exploits the cache hierarchy of modern general-purpose processors, and is suitable for parallel evaluation. Finally, the approach presented is incorporated into the JPEG 2000 compression chain, in which it has proved to be fundamentally faster than widely used implementations.

[1]  Dong-Wook Kim,et al.  VLSI Architecture of Line-Based Lifting Wavelet Transform for Motion JPEG2000 , 2007, IEEE Journal of Solid-State Circuits.

[2]  José González,et al.  An efficient implementation of a 3D wavelet transform based encoder on hyper-threading technology , 2007, Parallel Comput..

[3]  Stéphane Mallat,et al.  Multifrequency channel decompositions of images and wavelet models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  Christopher Brooks,et al.  Cache-efficient wavelet lifting in JPEG 2000 , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[5]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2002, The Kluwer International Series in Engineering and Computer Science.

[6]  Faouzi Kossentini,et al.  Reversible integer-to-integer wavelet transforms for image compression: performance evaluation and analysis , 2000, IEEE Trans. Image Process..

[7]  Pawel Lichocki,et al.  Two-Dimensional Discrete Wavelet Transform on Large Images for Hybrid Computing Architectures: GPU and CELL , 2011, Euro-Par Workshops.

[8]  Francisco Tirado,et al.  Vectorization of the 2D wavelet lifting transform using SIMD extensions , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[9]  A. Uhl,et al.  SIMD Parallelization of Common Wavelet Filters , 2005 .

[10]  M. Omair Ahmad,et al.  A Pipeline VLSI Architecture for Fast Computation of the 2-D Discrete Wavelet Transform , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.

[11]  S. Mallat Multiresolution approximations and wavelet orthonormal bases of L^2(R) , 1989 .

[12]  Manuel P. Malumbres,et al.  A fast 3D-DWT video encoder with reduced memory usage suitable for IPTV , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[13]  G. Bernabe,et al.  CUDA and OpenCL implementations of 3D Fast Wavelet Transform , 2012, 2012 IEEE 3rd Latin American Symposium on Circuits and Systems (LASCAS).

[14]  Raanan Fattal,et al.  Edge-avoiding wavelets and their applications , 2009, ACM Trans. Graph..

[15]  Masahiro Iwahashi Four-band decomposition module with minimum rounding operations , 2007 .

[16]  Adhemar Bultheel,et al.  The Red-Black Wavelet Transform , 1997 .

[17]  Martin Musil,et al.  Single-Loop Approach to 2-D Wavelet Lifting with JPEG 2000 Compatibility , 2015, 2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW).

[18]  Francisco Tirado,et al.  Wavelet Transform for Large Scale Image Processing on Modern Microprocessors , 2002, VECPAR.

[19]  Francisco Tirado,et al.  Parallel wavelet transform for large scale image processing , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[20]  Hitoshi Kiya,et al.  Non Separable Two Dimensional Discrete Wavelet Transform for Image Signals , 2013 .

[21]  Antonio Ortega,et al.  Minimum memory implementations of the lifting scheme , 2000, SPIE Optics + Photonics.

[22]  W. Sweldens The Lifting Scheme: A Custom - Design Construction of Biorthogonal Wavelets "Industrial Mathematics , 1996 .

[23]  Pavel Zemcík,et al.  Single-Loop Software Architecture for JPEG 2000 , 2016, 2016 Data Compression Conference (DCC).

[24]  Francisco Tirado Fernández,et al.  2-D wavelet transform enhancement on general-purpose microprocessors: memory hierarchy and SIMD parallelism exploitation , 2002 .

[25]  J.B.T.M. Roerdink,et al.  Accelerating wavelet-based video coding on graphics hardware using CUDA , 2009, 2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis.

[26]  Pavel Zemcík,et al.  Vectorization and parallelization of 2-D wavelet lifting , 2015, Journal of Real-Time Image Processing.

[27]  Roberto Lario,et al.  The 2D Discrete Wavelet Transform on Programmable Graphics Hardware , 2004 .

[28]  Stamatis Vassiliadis,et al.  Implementing the 2-D Wavelet Transform on SIMD-Enhanced General-Purpose Processors , 2008, IEEE Transactions on Multimedia.

[29]  Antonio Ortega,et al.  Line based reduced memory, wavelet image compression , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[30]  Petr Honzík,et al.  Reducing Instruction Issue Overheads in Application-Specific Vector Processors , 2012, 2012 15th Euromicro Conference on Digital System Design.

[31]  Hitoshi Kiya,et al.  A new lifting structure of non separable 2D DWT with compatibility to JPEG 2000 , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Maria E. Angelopoulou,et al.  Implementation and Comparison of the 5/3 Lifting 2D Discrete Wavelet Transform Computation Schedules on FPGAs , 2008, J. Signal Process. Syst..

[33]  Pavel Zemcík,et al.  Single-Loop Architecture for JPEG 2000 , 2016, ICISP.

[34]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Asadollah Shahbahrami,et al.  Data Locality Optimization Based on Comprehensive Knowledge of the Cache Miss Reason: A Case Study with DWT , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.

[36]  Manuel Ujaldon,et al.  Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs , 2010, ICCS.

[37]  David S. Taubman,et al.  Embedded block coding in JPEG 2000 , 2002, Signal Process. Image Commun..

[38]  Manuel P. Malumbres,et al.  Parallel strategies for 2D Discrete Wavelet Transform in shared memory systems and GPUs , 2012, The Journal of Supercomputing.

[39]  Pavel Zemcík,et al.  2-D Discrete Wavelet Transform Using GPU , 2014, 2014 International Symposium on Computer Architecture and High Performance Computing Workshop.

[40]  Wim Sweldens,et al.  Lifting scheme: a new philosophy in biorthogonal wavelet constructions , 1995, Optics + Photonics.

[41]  I. Daubechies,et al.  Biorthogonal bases of compactly supported wavelets , 1992 .

[42]  Rade Kutil,et al.  Parallelization of Wavelet Filters Using Simd Extensions , 2006, Parallel Process. Lett..

[43]  José González,et al.  Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions , 2005, J. VLSI Signal Process..

[44]  P. Rajmic,et al.  Lifting-based wavelet transform for images on modern CPU architectures , 2008, 2008 International Conference on Signals and Electronic Systems.

[45]  Petr Honzík,et al.  Video surveillance application based on application specific vector processors , 2012, Proceedings of the 2012 Conference on Design and Architectures for Signal and Image Processing.

[46]  Francisco Tirado,et al.  Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting , 2008, IEEE Transactions on Parallel and Distributed Systems.

[47]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[48]  Michael W. Marcellin,et al.  JPEG2000: standard for interactive imaging , 2002, Proc. IEEE.

[49]  Manuel E. Acacio,et al.  A Parallel Implementation of the 2D Wavelet Transform Using CUDA , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[50]  Jean-Didier Legat,et al.  Combined line-based architecture for the 5-3 and 9-7 wavelet transform of JPEG2000 , 2003, IEEE Trans. Circuits Syst. Video Technol..

[51]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[52]  Jos B. T. M. Roerdink,et al.  Accelerating Wavelet Lifting on Graphics Hardware Using CUDA , 2011, IEEE Transactions on Parallel and Distributed Systems.

[53]  Michael D. Adams,et al.  Reversible integer - to - integer wavelet transforms for image coding , 2002 .

[54]  Manuel P. Malumbres,et al.  Multicore-based 3D-DWT video encoder , 2013, EURASIP J. Adv. Signal Process..

[55]  Asadollah Shahbahrami Improving the performance of 2D Discrete Wavelet Transform using data-level parallelism , 2011, 2011 International Conference on High Performance Computing & Simulation.

[56]  Ulrich Drepper,et al.  What Every Programmer Should Know About Memory , 2007 .

[57]  I. Daubechies,et al.  Wavelet Transforms That Map Integers to Integers , 1998 .

[58]  Antonin Descampe,et al.  An efficient FPGA implementation of a flexible JPEG2000 decoder for Digital Cinema , 2004, 2004 12th European Signal Processing Conference.

[59]  Asadollah Shahbahrami,et al.  A Comparison of Two SIMD Implementations of the 2 D Discrete Wavelet Transform , 2007 .

[60]  M. Gabbouj,et al.  Low complexity bit-plane entropy coding for 3-D DWT-based video compression , 2012, Electronic Imaging.

[61]  Petr Honzík,et al.  Foreground detection and image segmentation in a flexible ASVP platform for FPGAs , 2012, Proceedings of the 2012 Conference on Design and Architectures for Signal and Image Processing.

[62]  Rade Kutil A single-loop approach to SIMD parallelization of 2D wavelet lifting , 2006, 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'06).

[63]  Richard E. Blahut,et al.  Fast Algorithms for Signal Processing: Bibliography , 2010 .

[64]  Stamatis Vassiliadis,et al.  Improving the memory behavior of vertical filtering in the discrete wavelet transform , 2006, CF '06.

[65]  Petr Honzík,et al.  The architecture and the technology characterization of an FPGA-based customizable Application-Specific Vector Processor , 2012, 2012 IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[66]  Pavel Zemcik,et al.  Wavelet Lifting on Application Specific Vector Processor , 2013 .

[67]  Rade Kutil,et al.  Short-Vector SIMD Parallelization in Signal Processing , 2009 .

[68]  David Taubman Software architectures for JPEG2000 , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[69]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[70]  I. Daubechies,et al.  Factoring wavelet transforms into lifting steps , 1998 .

[71]  Manuel P. Malumbres,et al.  On the efficient memory usage in the lifting scheme for the two-dimensional wavelet transform computation , 2005, IEEE International Conference on Image Processing 2005.

[72]  Ben H. H. Juurlink,et al.  SIMD Architectural Enhancements to Improve the Performance of the 2D Discrete Wavelet Transform , 2009, 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools.

[73]  Pavel Zemcík,et al.  Diagonal vectorisation of 2-D wavelet lifting , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[74]  Pavel Zemcík,et al.  Minimum Memory Vectorisation of Wavelet Lifting , 2013, ACIVS.

[75]  Mithuna Thottethodi,et al.  Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.

[76]  Asadollah Shahbahrami Algorithms and architectures for 2D discrete wavelet transform , 2012, The Journal of Supercomputing.

[77]  Jiří Matela GPU-Based DWT Acceleration for JPEG2000 , 2009 .

[78]  Andreas Uhl,et al.  Cache issues with JPEG2000 wavelet lifting , 2002, IS&T/SPIE Electronic Imaging.