Optimally Maximizing Iteration-Level Loop Parallelism
暂无分享,去创建一个
Minyi Guo | Zili Shao | Yi Wang | Duo Liu | Jingling Xue | Z. Shao | M. Guo | Jingling Xue | Duo Liu | Yi Wang
[1] Weijia Shang,et al. On Loop Transformations for Generalized Cycle Shrinking , 1994, IEEE Trans. Parallel Distributed Syst..
[2] Kunio Okuda,et al. Cycle Shrinking by Dependence Reduction , 1996, Euro-Par, Vol. I.
[3] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[4] Chih-Ping Chu,et al. Exploitation of parallelism to nested loops with dependence cycles , 2004, J. Syst. Archit..
[5] Minyi Guo,et al. Optimal loop parallelization for maximizing iteration-level parallelism , 2009, CASES '09.
[6] David Alejandro Padua Haiek. Multiprocessors: discussion of some theoretical and practical problems , 1980 .
[7] David A. Padua,et al. High-Speed Multiprocessors and Compilation Techniques , 1980, IEEE Transactions on Computers.
[8] J K Peir. Program partitioning and synchronization on multiprocessor systems , 1986 .
[9] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[10] Edwin Hsing-Mean Sha,et al. Retiming synchronous data-flow graphs to reduce execution time , 2001, IEEE Trans. Signal Process..
[11] Constantine D. Polychronopoulos. Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design , 1988, IEEE Trans. Computers.
[12] Pierre Boulet,et al. Loop Parallelization Algorithms: From Parallelism Extraction to Code Generation , 1998, Parallel Comput..
[13] Jih-Kwon Peir,et al. Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors , 1989, IEEE Trans. Computers.
[14] Lubomir F. Bic,et al. Exploiting iteration-level parallelism in dataflow programs , 1992, [1992] Proceedings of the 12th International Conference on Distributed Computing Systems.
[15] Jang-Ping Sheu,et al. On the Parallelism of Nested For-Loops Using Index Shift Method , 1990, ICPP.
[16] Josep Torrellas,et al. An efficient algorithm for the run-time parallelization of DOACROSS loops , 1994, Proceedings of Supercomputing '94.
[17] Tarek S. Abdelrahman,et al. Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..
[18] Chien-Min Wang,et al. Compiler techniques to extract parallelism within a nested loop , 1991, [1991] Proceedings The Fifteenth Annual International Computer Software & Applications Conference.
[19] Robert J. Fowler,et al. Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations , 2003, J. Parallel Distributed Comput..
[20] Doris L. Carver,et al. Reordering the statements with dependence cycles to improve the performance of parallel loops , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.
[21] Pen-Chung Yew,et al. Statement Re-ordering for DOACROSS Loops , 1994, ICPP.
[22] Zhiyuan Li,et al. An Efficient Data Dependence Analysis for Parallelizing Compilers , 1990, IEEE Trans. Parallel Distributed Syst..
[23] Wayne H. Wolf,et al. TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).
[24] Alain Darte,et al. Complexity of Multi-dimensional Loop Alignment , 2002, STACS.
[25] Anne Mignotte,et al. Source Code Loop Transformations for Memory Hierarchy Optimizations , 2001, PACT 2001.
[26] Alexander Aiken,et al. Optimal loop parallelization , 1988, PLDI '88.
[27] Yves Robert,et al. Revisiting cycle shrinking , 1992, Parallel Comput..
[28] Charles E. Leiserson,et al. Retiming synchronous circuitry , 1988, Algorithmica.
[29] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[30] Liang-Fang Chao,et al. Scheduling and behavioral transformation for parallel systems , 1993 .
[31] D. N. Jayasimha,et al. Some architectural and compilation issues in the design of hierarchical shared memory multiprocessors , 1992, Proceedings Sixth International Parallel Processing Symposium.
[32] Edwin Hsing-Mean Sha,et al. Polynomial-time nested loop fusion with full parallelism , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[33] Pen-Chung Yew,et al. Redundant Synchronization Elimination for DOACROSS Loops , 1999, IEEE Trans. Parallel Distributed Syst..
[34] Pen-Chung Yew. Is there exploitable thread-level parallelism in general-purpose application programs? , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[35] Edwin Hsing-Mean Sha,et al. Extended retiming: optimal scheduling via a graph-theoretical approach , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[36] Constantine D. Polychronopoulos,et al. Advanced Loop Optimizations for Parallel Computers , 1988, ICS.
[37] Robert E. Tarjan,et al. Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..
[38] Frédéric Vivien,et al. Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling , 1997, Parallel Process. Lett..