Memory Allocation for Window-Based Image Processing on Multiple Memory Modules with Simple Addressing Functions

Accelerator cores in low-power embedded processors have on-chip multiple memory modules to increase the data access speed and to enable parallel data access. When large functional units such as multipliers and dividers are used for addressing, a large power and chip area are consumed. Therefore, recent low-power processors use small functional units such as adders and counters to reduce the power and area. Such small functional units make it difficult to implement complex addressing patterns without duplicating data among multiple memory modules. The data duplication wastes the memory capacity and increases the data transfer time significantly. This paper proposes a method to reduce the memory duplication for window-based image processing, which is widely used in many applications. Evaluations using an accelerator core show that the proposed method reduces the data amount and data transfer time by more than 50%.

[1]  T. Kamei,et al.  Heterogeneous Multi-Core Architecture That Enables 54x AAC-LC Stereo Encoding , 2008, IEEE Journal of Solid-State Circuits.

[2]  Yasuhiro Kobayashi,et al.  Memory Allocation for Multi-Resolution Image Processing , 2008, IEICE Trans. Inf. Syst..

[3]  Milind Girkar,et al.  EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system , 2007, PLDI '07.

[4]  Xiaobo Li,et al.  XOR Storage Schemes for Frequently Used Data Patterns , 1995, J. Parallel Distributed Comput..

[5]  Yasuhiro Kobayashi,et al.  Optimal Periodic Memory Allocation for Image Processing With Multiple Windows , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Masanori Hariyama,et al.  Architecture of a stereo matching VLSI processor based on hierarchically parallel memory access , 2004, The 2004 47th Midwest Symposium on Circuits and Systems, 2004. MWSCAS '04..

[7]  Taewhan Kim,et al.  Memory allocation and mapping in high-level synthesis - an integrated approach , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[8]  N. Okumura,et al.  Design and Implementation of a Configurable Heterogeneous Multicore SoC With Nine CPUs and Two Matrix Processors , 2008, IEEE Journal of Solid-State Circuits.

[9]  Nikil D. Dutt,et al.  Low-power memory mapping through reducing address bus activity , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[10]  Krishnamoorthy Sivakumar,et al.  Morphologically Constrained GRFs: Applications to Texture Synthesis and Analysis , 1999, IEEE Trans. Pattern Anal. Mach. Intell..