An efficient VLSI architecture for separable 2-D discrete wavelet transform

In this paper, we present a VLSI architecture for separable 2-D Discrete Wavelet Transform (DWT). Based on 1-D DWT recursive pyramid algorithm (RPA), a complete 2-D DWT output scheduling scheme is derived. The I/O between memory which stores the intermediate results and DWT core is simplified by “circular coefficients arrangement”. And the concept to store the “partial accumulation sum” of convolution operation in column direction is first proposed in this paper. For the computations of N×N 2-D DWT with filter length L, our architecture spends N2 clock cycles and requires 2NL words in memory size, 4L multipliers, as well as 4L-2 adders. And the number of multipliers and adders can be further reduced to 2L, and 2L-1 respectively by sharing positive and negative clock edge. The architecture is suitable for VLSI implementation and various real-time video/image applications.