Optimal Data Scheduling for Uniform Multidimensional Applications

Uniform nested loops are broadly used in scientific and multidimensional digital signal processing applications. Due to the amount of data handled by such applications, on-chip memory is required to improve the data access and overall system performance. In this study a static data scheduling method, carrot-hole data scheduling, is proposed for multidimensional applications, in order to control the data traffic between different levels of memory. Based on this data schedule, optimal partitioning and scheduling are selected. Experiments show that by using this technique, on-chip memory misses are significantly reduced as compared to results obtained from traditional methods. The carrot-hole data scheduling method is proven to obtain smallest on-chip memory misses compared with other linear scheduling and partitioning schemes.

[1]  Edwin Hsing-Mean Sha,et al.  Full Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[2]  Edwin Hsing-Mean Sha,et al.  Partitioning and retiming of multi-dimensional systems , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[3]  Edwin Hsing-Mean Sha,et al.  Static scheduling of uniform nested loops , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Wayne Burleson The partitioning problem on VLSI arrays: I/O and local memory complexity , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Anant Agarwal,et al.  Automatic Partitioning of Parallel Loops for Cache-Coherent Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[7]  Jang-Ping Sheu,et al.  Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers , 1994, IEEE Trans. Parallel Distributed Syst..

[8]  Jan M. Rabaey,et al.  Memory Estimation for High Level Synthesis , 1994, 31st Design Automation Conference.

[9]  Jang-Ping Sheu,et al.  Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[10]  Yves Robert,et al.  Constructive Methods for Scheduling Uniform Loop Nests , 1994, IEEE Trans. Parallel Distributed Syst..

[11]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[12]  Joos Vandewalle,et al.  Background Memory Synthesis for Algebraic Algorithms on Multi-Processor DSP Chips , 1989 .

[13]  Joos Vandewalle,et al.  In-place memory management of algebraic algorithms on application specific ICs , 1991, J. VLSI Signal Process..

[14]  Daniel D. Gajski,et al.  High ― Level Synthesis: Introduction to Chip and System Design , 1992 .

[15]  Santosh G. Abraham,et al.  Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic , 1991, IEEE Trans. Parallel Distributed Syst..

[16]  H. Emmons,et al.  Critique of Numerical Modeling of Fluid-Mechanics Phenomena , 1970 .

[17]  Dan I. Moldovan,et al.  Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays , 1986, IEEE Transactions on Computers.