Transformations of a 3D Image Reconstruction Algorithm for Data Transfer and Storage Optimisation

When implementing a 3D image reconstruction algorithm on a DSP architecture, we find ourselves confronted with a large memory transfer overhead, reducing the possible speedup attainable on recent multi-media oriented architectures. This paper describes how the critical part of the algorithm is re-specified and aggressively transformed at the algorithm code level, to improve the data access locality of the multi-dimensional image signal, while preserving the input/output behaviour. Experiments show that a close to optimal reuse of the data in the foreground memory and registers is obtained, removing the data transfer and storage bottleneck and enabling real-time prototyping of the algorithm on a DSP architecture.

[1]  H. De Man,et al.  Optimization of memory organization and hierarchy for decreased size and power in video and image processing systems , 1995, Records of the 1995 IEEE International Workshop on Memory Technology, Design and Testing.

[2]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[3]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[4]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[5]  Monica S. Lam,et al.  Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[6]  Edwin H.-M. Sha,et al.  Synchronous circuit optimization via multidimensional retiming , 1996 .

[7]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[8]  Luc Van Gool,et al.  One-shot active 3D shape acquisition , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[9]  François Bodin,et al.  Accurate Data Distribution into Blocks may Boost Cache Performance , 1997 .

[10]  Joos Vandewalle,et al.  Background Memory Synthesis for Algebraic Algorithms on Multi-Processor DSP Chips , 1989 .

[11]  Andrew Blake,et al.  Trinocular Active Range-Sensing , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Hugo De Man,et al.  Program transformation strategies for memory size and power reduction of pseudoregular multimedia subsystems , 1998, IEEE Trans. Circuits Syst. Video Technol..

[13]  Luc Van Gool,et al.  Active acquisition of 3D shape for moving objects , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[14]  Keshav Pingali,et al.  A Singular Loop Transformation Framework Based on Non-Singular Matrices , 1992, LCPC.

[15]  Minoru Maruyama,et al.  Range Sensing by Projecting Multiple Slits with Random Cuts , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Saman Amarasinghe,et al.  The suif compiler for scalable parallel machines , 1995 .

[17]  John A. Chandy,et al.  The Paradigm Compiler for Distributed-Memory Multicomputers , 1995, Computer.

[18]  Mi Lu,et al.  An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing , 1991, IEEE Trans. Computers.

[19]  Hugo De Man,et al.  Formalized methodology for data reuse: exploration for low-power hierarchical memory mappings , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[20]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[21]  Nikil D. Dutt,et al.  Elimination of redundant memory traffic in high-level synthesis , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[22]  W.F.J. Verhaegh,et al.  Allocation of multiport memories for hierarchical data streams , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[23]  André Oosterlinck,et al.  Range Image Acquisition with a Single Binary-Encoded Light Pattern , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Sharad Malik,et al.  Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.

[25]  William Pugh,et al.  Generating schedules and code within a unified reordering transformation framework , 1992 .

[26]  Ken Kennedy,et al.  The parascope editor: an interactive parallel programming tool , 1993, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[27]  Hugo De Man,et al.  System-Level Data-Flow Transformation Exploration and Power-Area Trade-offs Demonstrated on Video Codecs , 1998, J. VLSI Signal Process..