Parallel architecture for the discrete wavelet transform based on the lifting factorization

One major difficulty in designing an architecture for the parallel implementation of Discrete Wavelet Transform (DWT) is that the DWT is not a block transform. As a result, frequent communication has to be set up between processors to exchange data so that correct boundary wavelet coefficients can be computed. The significant communication overhead thus hampers the improvement of the efficiency of parallel systems, specially for processor networks with large communication latencies. In this paper we propose a new technique, called Boundary Postprocessing, that allows the correct transform of boundary samples. The basic idea is to model the DWT as a Finite State Machine based on the lifting factorization of the wavelet filterbanks. Application of this technique leads to a new parallel DWT architecture. Split-and-Merge, which requires data to be communicated only once between neighboring processors for any arbitrary level of wavelet decompositions. Example designs and performance analysis for 1D and 2D DWT show that the proposed technique can greatly reduce the interprocessor communication overhead. As an example, in a two-processor case our proposed approach shows an average speedup of about 30% as compared to best currently available parallel computation.