Recent microprocessors have been enhanced with media instruction sets for accelerating media algorithms. They exploit the fact that media algorithms have small data types, and widths much less than that of the processor. Current media instruction sets support only 8-, 16- and 32-bit sub-datatypes. This scheme is inefficient in several applications where bit lengths of 9, 12 and so on are used. We need user programmable sub-datatype bit lengths. S. Balakrishnan and S.K. Nandy (1998) discuss arbitrary boundary packed addition. Many media algorithms are based on multiply-accumulate algorithms. For full acceleration we also need arbitrary boundary packed multiplication. We present such a scheme based on Wallace tree multiplication. We also expand on Balakrishnan and Nandy and provide a detailed treatment of the intermediate carries of sub-datatypes which were lost in the previous work. These carries could be used for saturation arithmetic and flow control.
[1]
Andrew D. Booth,et al.
A SIGNED BINARY MULTIPLICATION TECHNIQUE
,
1951
.
[2]
S. K. Nandy,et al.
Arbitrary precision arithmetic-SIMD style
,
1998,
Proceedings Eleventh International Conference on VLSI Design.
[3]
Jalil Fadavi-Ardekani,et al.
M*N Booth encoded multiplier generator using optimized Wallace trees
,
1992,
Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.
[4]
D. H. Jacobsohn,et al.
A Suggestion for a Fast Multiplier
,
1964,
IEEE Trans. Electron. Comput..
[5]
J. Pihl,et al.
A multiplier and squarer generator for high performance DSP applications
,
1996,
Proceedings of the 39th Midwest Symposium on Circuits and Systems.
[6]
Joseph J. F. Cavanagh.
Digital Computer Arithmetic: Design And Implementation
,
1984
.