论文信息 - Graph analysis and transformation techniques for runtime minimization in multi-threaded architectures

Graph analysis and transformation techniques for runtime minimization in multi-threaded architectures

Describes a method of analysis for detecting and minimizing memory latency using a directed data dependency graph produced from a compiler. These results are applicable to the development of methods for the optimal generation of instruction threads to be executed on a multi-threaded, data-driven architecture. The resulting runtime reductions are accomplished by minimizing memory access times by individual processing elements. Additionally, these analysis methods can be used to predict measures of achievable parallelism for a given program graph which can be exploited by a reconfigurable, multi-threaded architecture.

Mitchell A. Thornton | D. L. Andrews | M. Thornton | D. Andrews

[1] Erik J. Gilbert. An Investigation of the Partitioning of Algorithms Across an MIMD Computing System ( XMAP-I ) , .

[2] John Glauert,et al. SISAL: streams and iteration in a single-assignment language. Language reference manual, Version 1. 1 , 1983 .

[3] Vivek Sarkar,et al. Partitioning and scheduling parallel programs for execution on multiprocessors , 1987 .

[4] David Andrew Hornig. Automatic partitioning and scheduling on a network of personal computers , 1984 .

[5] Vivek Sarkar,et al. Compile-time partitioning and scheduling of parallel programs , 1986, SIGPLAN '86.

[6] Paraskevas Evripidou,et al. A Decoupled Graph/Computation Data-Driven Architecture with Variable-Resolution Actors , 1990, International Conference on Parallel Processing.

[7] David L. Andrews. Application specific analysis of parallel computing systems , 1992 .

[8] Vivek Sarkar,et al. An optimal asynchronous scheduling algorithm for software cache consistency , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[9] Ronald L. Graham,et al. Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[10] Feipei Lai,et al. Efficient Exploitation of Instruction-Level Parallelism for Superscalar Processors by the Conjugate Register File Scheme , 1996, IEEE Trans. Computers.

[11] David A. Padua,et al. Dependence graphs and compiler optimizations , 1981, POPL '81.

[12] John Feo,et al. An analysis of the computational and parallel complexity of the Livermore Loops , 1988, Parallel Comput..