Transformations of a 3D image reconstruction algorithm for data transfer and storage optimisation

When implementing a 3D image reconstruction algorithm on a DSP architecture, we find ourselves confronted with a large memory transfer overhead, reducing the possible speedup attainable on recent multimedia-oriented architectures. This paper describes how the critical part of the algorithm is re-specified and aggressively transformed, to improve the data access locality of the multi-dimensional image signal, while preserving the input/output behaviour. Experiments show that a close-to-optimal reuse of the data in the foreground memory and registers is obtained, removing the data transfer and storage bottleneck and enabling real-time prototyping of the algorithm on a DSP architecture.