System-Level Memory Management for Weakly Parallel Image Processing

Application studies in the domain of image and video processing indicate that between 50 and 80% of the area cost in (applicationspecific) architectures for multi-dimensional (M-D) signal processing is due to memory units. This is true for both single-processor and weakly parallel processor realizations. This paper has two main contributions. First, to reduce this dominant cost, we propose to address the system-level storage organization for the M-D signals as a first step in the overall methodology to map these applications. Secondly, we will demonstrate the usefulness of this novel approach based on a realistic image processing test-vehicle, namely a cavity detection algorithm. The novel design results for this relevant application are useful as such.

[1]  Leon Stok,et al.  Foreground memory management in data path synthesis , 1992, Int. J. Circuit Theory Appl..

[2]  W.F.J. Verhaegh,et al.  Allocation of multiport memories for hierarchical data streams , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[3]  Yves Robert,et al.  Loop nest scheduling and transformations , 1993 .

[4]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[5]  Patrice Quinton,et al.  Algorithms and Parallel VLSI Architectures , 1992, Algorithms and Parallel VLSI Architectures.

[6]  Hugo De Man,et al.  Architecture-driven synthesis techniques for VLSI implementation of DSP algorithms , 1990, Proc. IEEE.

[7]  Daniel P. Siewiorek,et al.  Automated Synthesis of Data Paths in Digital Systems , 1986, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  David J. Lilja,et al.  The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor , 1994, IEEE Trans. Parallel Distributed Syst..

[9]  William Jalby,et al.  A strategy for array management in local memory , 1994, Math. Program..

[10]  Roland Rühl,et al.  Automatic parallelization of LINPACK routines on distributed memory parallel processors , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[11]  Alice C. Parker,et al.  The high-level synthesis of digital systems , 1990, Proc. IEEE.

[12]  Jack Dongarra,et al.  Environments and Tools for Parallel Scientific Computing , 1993 .

[13]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[14]  Saman Amarasinghe,et al.  The suif compiler for scalable parallel machines , 1995 .

[15]  Jang-Ping Sheu,et al.  Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[16]  Constantine D. Polychronopoulos Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design , 1988, IEEE Trans. Computers.

[17]  Ken Kennedy,et al.  The parascope editor: an interactive parallel programming tool , 1993, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[18]  Hugo De Man,et al.  Compiling multi-dimensional data streams into distributed DSP ASIC memory , 1991, 1991 IEEE International Conference on Computer-Aided Design Digest of Technical Papers.

[19]  Mi Lu,et al.  An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing , 1991, IEEE Trans. Computers.

[20]  Konstantinos Konstantinides,et al.  Task allocation and scheduling models for multiprocessor digital signal processing , 1990, IEEE Trans. Acoust. Speech Signal Process..

[21]  H.J. De Man,et al.  Automating High Level Control F'low Transformations For Dsp Memory Management , 1992, Workshop on VLSI Signal Processing.

[22]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[23]  Chau-Wen Tseng,et al.  An Overview of the SUIF Compiler for Scalable Parallel Machines , 1995, PPSC.

[24]  Viraphol Chaiyakul,et al.  An algorithm for array variable clustering , 1994, Proceedings of European Design and Test Conference EDAC-ETC-EUROASIC.

[25]  M. Bister,et al.  Automated segmentation of cardiac MR images , 1989, [1989] Proceedings. Computers in Cardiology.

[26]  Peter Pirsch,et al.  Mapping Complex Image Processing Algorithms onto Heterogeneous Multiprocessors Regarding Architecture Dependent Performance Parameters , 1995 .

[27]  Chung-Ta King,et al.  MULTIPAR: behavioral partition for synthesizing multiprocessor architectures , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[28]  Rudolf Eigenmann,et al.  Automatic program parallelization , 1993, Proc. IEEE.