Parallel 4/spl times/4 2D transform and inverse transform architecture for MPEG-4 AVC/H.264

Transform coding has been widely used in video coding standards. In this paper, a hardware architecture for accelerating transform coding operations in MPEG-4 AVC/H.264 is presented. This architecture calculates 4 inputs in parallel by fast algorithms described previously. The transpose operations are implemented by a register array with directional transfers. This architecture has been mapped into a 4 /spl times/ 4 multiple transforms unit and synthesized in TSMC 0.35um technology. The multiple transform processor can process 320M pixels/sec at 80Mhz for all 4 /spl times/ 4 transforms used in MPEG-4 AVC/ H.264.