Carrot-hole Data Scheduling and AdaptivePartitioning for Memory Tra cMinimization

Massive uniform nested loops are broadly used in scientiic and multi-dimensional Digital Signal Processing applications. Due to the amount of data handled by such applications, cache or on-chip memory are required to improve the data access and overall system performance. Most of existing application speciic systems do not eeciently optimize the access to diierent levels of memory hierarchy. In this study, a static data scheduling method, carrot-hole data scheduling, is proposed for multi-dimensional applications represented by multi-dimensional data ow graphs, in order to control the data traac between diierent levels of memory. Based on this data schedule, optimal partitioning and scheduling are selected. The partition size is also chosen in such a way to minimize memory access overhead. Experiments show that by using this technique, on-chip memory misses are signiicantly reduced as compared to results obtained from traditional methods. The carrot-hole data scheduling method is proven to obtain smallest on-chip memory misses compared with other linear scheduling and partitioning schemes.

[1]  Wayne Burleson The partitioning problem on VLSI arrays: I/O and local memory complexity , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Yves Robert,et al.  Constructive Methods for Scheduling Uniform Loop Nests , 1994, IEEE Trans. Parallel Distributed Syst..

[3]  Daniel D. Gajski,et al.  High ― Level Synthesis: Introduction to Chip and System Design , 1992 .

[4]  Edwin Hsing-Mean Sha,et al.  Loop Pipelining for Scheduling Multi-Dimensional Systems via Rotation , 1994, 31st Design Automation Conference.

[5]  Edwin Hsing-Mean Sha,et al.  Static scheduling of uniform nested loops , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[6]  Joos Vandewalle,et al.  Background Memory Synthesis for Algebraic Algorithms on Multi-Processor DSP Chips , 1989 .

[7]  Edwin Hsing-Mean Sha,et al.  Partitioning and retiming of multi-dimensional systems , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[8]  Dan I. Moldovan,et al.  Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays , 1986, IEEE Transactions on Computers.

[9]  Santosh G. Abraham,et al.  Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic , 1991, IEEE Trans. Parallel Distributed Syst..

[10]  Jan M. Rabaey,et al.  Memory Estimation for High Level Synthesis , 1994, 31st Design Automation Conference.

[11]  Jang-Ping Sheu,et al.  Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[12]  H. Emmons,et al.  Critique of Numerical Modeling of Fluid-Mechanics Phenomena , 1970 .

[13]  Anant Agarwal,et al.  Automatic Partitioning of Parallel Loops for Cache-Coherent Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.