We discuss mapping the matrix multiplication algorithm onto a two-level hierarchical memory system which incorporates DMA capabilities between levels, as available on Digital Signal Processors (DSPs). We show that it is possible to hide the hierarchical nature of the memory system from the processor, so that computations can proceed at the processor’s speed. This is accomplished by the use of a block algorithm, and by prefetching data from the slower second-level memory into the faster but smaller first-level memory under DMA control. The Texas Instruments TMS 320C30 Digital Signal Processor is used as an example, and performance estimates for different memory timings are given. These results are also compared to the performance of executing the matrix multiplication algorithm without exploiting the DMA capabilities.
[1]
E.A. Lee.
Programmable DSP architectures. II
,
1989,
IEEE ASSP Magazine.
[2]
William Jalby,et al.
The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory
,
1987
.
[3]
Ira Krepchin,et al.
Texas Instruments Inc.
,
1963,
Nature.
[4]
H. J. Whitehouse,et al.
A Review Of Signal Processing With Systolic Arrays
,
1983,
Optics & Photonics.
[5]
Jack J. Dongarra,et al.
A set of level 3 basic linear algebra subprograms
,
1990,
TOMS.
[6]
K. A. Gallivan,et al.
Parallel Algorithms for Dense Linear Algebra Computations
,
1990,
SIAM Rev..