Asynchronous transpose-matrix architectures

The matrix transposition operation is a necessary step in several image/video compression and decompression algorithms, in particular the discrete cosine transform (DCT) and its inverse (IDCT), and some distributed arithmetic applications. These algorithms have to be performed at high data-rates, and with a minimum of power dissipation for portable applications. The authors describe how the clocked solution is usually implemented, and present two new asynchronous architectures that perform matrix transposition. These architectures, one based on two phase signaling, one based on four phase signaling, have better characteristics than the clocked solution in terms of latency and power, at no cost in area or throughput. They discuss the characteristics of these three architectures and evaluate the relative advantages of each one.

[1]  Shannon V. Morton,et al.  An event controlled reconfigurable multi-chip FFT , 1994, Proceedings of 1994 IEEE Symposium on Advanced Research in Asynchronous Circuits and Systems.

[2]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[3]  Didier Le Gall,et al.  MPEG: a video compression standard for multimedia applications , 1991, CACM.

[4]  Weiping Li,et al.  DCT/IDCT processor design for high data rate image coding , 1992, IEEE Trans. Circuits Syst. Video Technol..