Automatic Parallelization of Loop Programs for Distributed Memory Architectures
暂无分享,去创建一个
[1] Thomas Kailath,et al. Regular iterative algorithms and their implementation on processor arrays , 1988, Proc. IEEE.
[2] Albert Cohen. Program Analysis and Transformation: From the Polytope Model to Formal Languages. (Analyse et transformation de programmes: du modèle polyédrique aux langages formels) , 1999 .
[3] Albert Coheny Jean-Fran. Array Data--ow Analysis for Imperative Recursive Programs , 1996 .
[4] Jingling Xue,et al. On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..
[5] Alain Darte,et al. Automatic Parallelization Based on Multi-Dimensional Scheduling , 1994 .
[6] David B. Skillicorn,et al. Questions and Answers about BSP , 1997, Sci. Program..
[7] Frédéric Vivien,et al. On the Optimality of Allen and Kennedy's Algorithm for Parallelism Extraction in Nested Loops , 1996, Parallel Algorithms Appl..
[8] Hiroshi Ohta,et al. Optimal tile size adjustment in compiling general DOACROSS loop nests , 1995, ICS '95.
[9] Yves Robert,et al. Linear Scheduling Is Nearly Optimal , 1991, Parallel Process. Lett..
[10] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[11] William Pugh,et al. Eliminating false data dependences using the Omega test , 1992, PLDI '92.
[12] Sanjay V. Rajopadhye,et al. Optimal Orthogonal Tiling of 2-D Iterations , 1997, J. Parallel Distributed Comput..
[13] Frédéric Vivien,et al. Scheduling the Computations of a Loop Nest with Respect to a Given Mapping , 2000, Euro-Par.
[14] Aart J. C. Bik,et al. Automatically exploiting implicit parallelism in Java , 1997, Concurr. Pract. Exp..
[15] Albert Cohen,et al. Maximal Static Expansion , 1998, POPL '98.
[16] Cédric Bastoul,et al. Efficient code generation for automatic parallelization and optimization , 2003, Second International Symposium on Parallel and Distributed Computing, 2003. Proceedings..
[17] Paul Feautrier,et al. Fuzzy Array Dataflow Analysis , 1997, J. Parallel Distributed Comput..
[18] William Pugh,et al. The Omega Library interface guide , 1995 .
[19] Martin Griebl,et al. Application of the Polytope Model to Functional Programs , 1999, LCPC.
[20] Patrice Quinton,et al. The mapping of linear recurrence equations on regular arrays , 1989, J. VLSI Signal Process..
[21] Martin Griebl,et al. Code generation in the polytope model , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[22] John A. Chandy,et al. Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.
[23] Weijia Shang,et al. On Time Optimal Supernode Shape , 2002, IEEE Trans. Parallel Distributed Syst..
[24] Leslie Lamport,et al. The parallel execution of DO loops , 1974, CACM.
[25] Mohamed Jemni,et al. On the parallelization of single dynamic conditional loops , 1996, Simul. Pract. Theory.
[26] Martin Griebl,et al. A Precise Fixpoint Reaching Definition Analysis for Arrays , 1999, LCPC.
[27] Geoffrey C. Fox,et al. A High Level SPMD Programming Model: HPspmd and its Java Language Binding , 1998 .
[28] Martin Griebl,et al. Termination detection in parallel loop nests with while loops , 1999, Parallel Comput..
[29] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[30] Steven K. Feiner,et al. Introduction to Computer Graphics , 1993 .
[31] Keshav Pingali,et al. Tiling Imperfectly-nested Loop Nests (REVISED) , 2000 .
[32] Zhiyuan Li,et al. A Compiler Framework for Tiling Imperfectly-Nested Loops , 1999, LCPC.
[33] Aart J. C. Bik,et al. Automatically exploiting implicit parallelism in Java , 1997 .
[34] Doran Wilde,et al. A LIBRARY FOR DOING POLYHEDRAL OPERATIONS , 2000 .
[35] Philippe Clauss. Counting Solutions to Linear and Nonlinear Constraints Through Ehrhart Polynomials: Applications to Analyze and Transform Scientific Programs , 1996, International Conference on Supercomputing.
[36] Martin Griebl,et al. Forward Communication Only Placements and Their Use for Parallel Program Construction , 2002, LCPC.
[37] Yves Robert,et al. Mapping Uniform Loop Nests Onto Distributed Memory Architectures , 1993, Parallel Comput..
[38] Frédéric Vivien,et al. Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs , 2004, International Journal of Parallel Programming.
[39] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[40] Katherine A. Yelick,et al. Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..
[41] Larry Carter,et al. Languages and compilers for parallel computing : 12th International Workshop, LCPC'99, La Jolla, CA, USA, August 4-6, 1999 : proceedings , 2000 .
[42] Michael Philippsen,et al. JavaParty - Transparent Remote Objects in Java , 1997, Concurr. Pract. Exp..
[43] Corinne Ancourt,et al. Scanning polyhedra with DO loops , 1991, PPOPP '91.
[44] Utpal Banerjee. Loop Parallelization , 1994, Springer US.
[45] Larry Carter,et al. Selecting tile shape for minimal execution time , 1999, SPAA '99.
[46] J. P. Burg,et al. Maximum entropy spectral analysis. , 1967 .
[47] Sanjay V. Rajopadhye,et al. Optimal semi-oblique tiling , 2001, SPAA '01.
[48] Paul Feautrier,et al. Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.
[49] Ian Foster,et al. Designing and building parallel programs , 1994 .
[50] Sanjay V. Rajopadhye,et al. Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.
[51] David K. Smith. Theory of Linear and Integer Programming , 1987 .
[52] Geoffrey C. Fox. Java for computational science and engineering – simulation and modeling II , 1997 .
[53] M. Birkner,et al. Blow-up of semilinear PDE's at the critical dimension. A probabilistic approach , 2002 .
[54] Patrice Quinton,et al. The ALPHA language and its use for the design of systolic arrays , 1991, J. VLSI Signal Process..
[55] D.I. Moldovan,et al. On the design of algorithms for VLSI systolic arrays , 1983, Proceedings of the IEEE.
[56] Monica S. Lam,et al. Maximizing Parallelism and Minimizing Synchronization with Affine Partitions , 1998, Parallel Comput..
[57] Aart J. C. Bik,et al. Advanced Compiler Optimizations for Sparse Computations , 1995, J. Parallel Distributed Comput..
[58] Sanjay V. Rajopadhye,et al. Optimal Orthogonal Tiling , 1998, Euro-Par.
[59] Carl-Erik Fröberg,et al. Numerical mathematics - theory and computer applications , 1985 .
[60] Larry Carter,et al. Determining the idle time of a tiling , 1997, POPL '97.
[61] Utpal Banerjee,et al. Speedup of ordinary programs , 1979 .
[62] Utpal Banerjee,et al. Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.
[63] Laurence A. Wolsey,et al. Integer and Combinatorial Optimization , 1988 .
[64] J. Kiefer,et al. Sequential minimax search for a maximum , 1953 .
[65] Jean-Francois Collard. Code Generation in Automatic Parallelizers , 1994, Applications in Parallel and Distributed Computing.
[66] Alexander Schrijver,et al. Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.
[67] J. Cadzow. Maximum Entropy Spectral Analysis , 2006 .
[68] W. Kelly,et al. Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.
[69] Martin Griebl,et al. Array Dataflow Analysis for Explicitly Parallel Programs , 1996, Euro-Par, Vol. I.
[70] Paul Feautrier,et al. Automatic Storage Management for Parallel Programs , 1998, Parallel Comput..
[71] I N Bronstein,et al. Taschenbuch der Mathematik , 1966 .
[72] Thomas Brandes. The importance of direct dependences for automatic parallelization , 1988, ICS '88.
[73] Paul Feautrier,et al. Fuzzy array dataflow analysis , 1995, PPOPP '95.
[74] J. Ramanujam,et al. Non-unimodular transformations of nested loops , 1992, Proceedings Supercomputing '92.
[75] Martin Griebl,et al. Array Dataflow Analysis for Explicitly Parallel Programs , 1997, Parallel Process. Lett..
[76] Robert W. Floyd,et al. The Language of Machines: an Introduction to Computability and Formal Languages , 1994 .
[77] Andreas Krall,et al. Efficient JavaVM just-in-time compilation , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[78] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[79] J. Ramanujam,et al. Beyond unimodular transformations , 1995, The Journal of Supercomputing.
[80] Jack Dongarra,et al. Automatic Blocking of Nested Loops , 1990 .
[81] Barbara M. Chapman,et al. Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.
[82] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[83] Curtis F. Gerald,et al. APPLIED NUMERICAL ANALYSIS , 1972, The Mathematical Gazette.
[84] Yves Robert,et al. Mapping affine loop nests: new results , 1995, HPCN Europe.
[85] E. A. Maxwell. Book Reviews: The Methods of Plane Projective Geometry Based on the Use of General Homogeneous Coordinates , 1946 .
[86] Martin Griebl,et al. The Loop Parallelizer LooPo-Announcement , 1996, LCPC.
[87] Jingling Xue,et al. Reuse-Driven Tiling for Improving Data Locality , 1998, International Journal of Parallel Programming.
[88] Christian Lengauer,et al. Loop Parallelization in the Polytope Model , 1993, CONCUR.
[89] Paul Feautrier,et al. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.
[90] Martin Griebl,et al. Issues of the Automatic Generation of HPF Loop Programs , 2000, LCPC.
[91] Martin Griebl,et al. Data Flow Analysis of Recursive Structures , 1996 .
[92] Ken Kennedy,et al. Evaluating Compiler Optimizations for Fortran D , 1994, J. Parallel Distributed Comput..
[93] Erik H. D'Hollander,et al. Partitioning and Labeling of Loops by Unimodular Transformations , 1992, IEEE Trans. Parallel Distributed Syst..
[94] Lawrence Rauchwerger,et al. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.
[95] Frédéric Vivien,et al. A unified framework for schedule and storage optimization , 2001, PLDI '01.
[96] P. Feautrier. Parametric integer programming , 1988 .
[97] Jürgen Teich,et al. Partitioning of processor arrays: a piecewise regular approach , 1993, Integr..
[98] Martin Griebl,et al. Replicated Placements in the Polyhedron Model , 2003, Euro-Par.
[99] Martin Griebl,et al. Applicability of the Polytope Model to Functional Programs , 1998 .
[100] Yonghong Song,et al. Unroll-and-jam for imperfectly-nested loops in DSP applications , 2000, CASES '00.
[101] Patrice Quinton,et al. The systematic design of systolic arrays , 1987 .
[102] Paul Feautrier,et al. Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.
[103] Frank Harary,et al. Graph Theory , 2016 .
[104] Hyuk-Jae Lee,et al. Communication-Minimal Partitioning and Data Alignment for Affine Nested Loops , 1997, Comput. J..
[105] Martin Griebl. The mechanical parallelization of loop nests containing while loops , 1997 .
[106] Michael Philippsen,et al. JavaParty – transparent remote objects in Java , 1997 .
[107] Keshav Pingali,et al. Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests , 2001, International Journal of Parallel Programming.
[108] Gilles Villard,et al. Lattice-based memory allocation , 2003, IEEE Transactions on Computers.
[109] Sanjay V. Rajopadhye,et al. Optimizing memory usage in the polyhedral model , 2000, TOPL.
[110] Paul Feautrier. Toward Automatic Distribution , 1994, Parallel Process. Lett..
[111] Dan I. Moldovan,et al. Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays , 1986, IEEE Transactions on Computers.
[112] Ken Kennedy,et al. Automatic translation of FORTRAN programs to vector form , 1987, TOPL.
[113] A. J. C. Bik,et al. Advanced compiler optimizations for sparse computations , 1993, Supercomputing '93.
[114] Daniel A. Reed,et al. Stencils and Problem Partitionings: Their Influence on the Performance of Multiple Processor Systems , 1987, IEEE Transactions on Computers.
[115] William Pugh,et al. Static analysis of upper and lower bounds on dependences and parallelism , 1994, TOPL.
[116] Yves Robert,et al. Loop Parallelization Algorithms , 2001, Compiler Optimizations for Scalable Parallel Systems Languages.
[117] Jingling Xue,et al. Communication-Minimal Tiling of Uniform Dependence Loops , 1996, J. Parallel Distributed Comput..
[118] Richard M. Karp,et al. The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.
[119] Martin Griebl,et al. Index Set Splitting , 2000, International Journal of Parallel Programming.
[120] Paul Feautrier,et al. Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.
[121] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[122] Katherine Yelick,et al. Titanium: a high-performance Java dialect , 1998 .
[123] Peiyi Tang,et al. Dynamic Processor Self-Scheduling for General Parallel Nested Loops , 1987, IEEE Trans. Computers.
[124] Yves Robert,et al. Determining the idle time of a tiling: new results , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.
[125] Mohamed Jemni,et al. Restructuring and Parallelizing a Static Conditional Loop , 1995, Parallel Comput..
[126] D. Sorensen. Numerical methods for large eigenvalue problems , 2002, Acta Numerica.
[127] Guy L. Steele,et al. The High Performance Fortran Handbook , 1993 .
[128] Armin Größlinger,et al. Introducing Non-linear Parameters to the Polyhedron Model , 2004 .
[129] Arthur J. Bernstein,et al. Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..