Algorithms design for the parallelization of nested loops
暂无分享,去创建一个
[1] Yu Hen Hu,et al. A novel modular systolic array architecture for full-search block matching motion estimation , 1995, IEEE Trans. Circuits Syst. Video Technol..
[2] Mihalis Yannakakis,et al. Towards an architecture-independent analysis of parallel algorithms , 1990, STOC '88.
[3] Wentong Cai,et al. Time-minimal tiling when rise is larger than zero , 2002, Parallel Comput..
[4] Rupert W. Ford,et al. An investigation of feedback guided dynamic scheduling of nested loops , 2000, Proceedings 2000. International Workshop on Parallel Processing.
[5] Rudolf Eigenmann,et al. Automatic program parallelization , 1993, Proc. IEEE.
[6] Amit Rao,et al. Optimal task scheduling at run time to exploit intra-tile parallelism , 2003, Parallel Comput..
[7] Alan Weiss,et al. Allocating Independent Subtasks on Parallel Processors , 1985, IEEE Transactions on Software Engineering.
[8] Theodore Andronikos,et al. Cronus: A platform for parallel code generation based on computational geometry methods , 2008, J. Syst. Softw..
[9] Nectarios Koziris,et al. Lower Time and Processor Bounds for Efficient Mapping of Uniform Dependence Algorithms into Systolic Arrays , 1997, Parallel Algorithms Appl..
[10] Weijia Shang,et al. Time Optimal Linear Schedules for Algorithms with Uniform Dependencies , 1991, IEEE Trans. Computers.
[11] Utpal Banerjee,et al. Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.
[12] Jeanette P. Schmidt,et al. Load-sharing in heterogeneous systems via weighted factoring , 1996, SPAA '96.
[13] Eugene L. Lawler,et al. Scheduling In and Out Forests in the Presence of Communication Delays , 1996, IEEE Trans. Parallel Distributed Syst..
[14] Yves Robert,et al. Linear Scheduling Is Nearly Optimal , 1991, Parallel Process. Lett..
[15] Allan Gottlieb,et al. Highly parallel computing , 1989, Benjamin/Cummings Series in computer science and engineering.
[16] Edward D. Lazowska,et al. Adaptive load sharing in homogeneous distributed systems , 1986, IEEE Transactions on Software Engineering.
[17] Behrooz Parhami,et al. Introduction to Parallel Processing: Algorithms and Architectures , 1999 .
[18] Tarek S. Abdelrahman,et al. Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors , 2001, IEEE Trans. Parallel Distributed Syst..
[19] Theodore Andronikos,et al. Reducing the Communication Cost via Chain Pattern Scheduling , 2005, Fourth IEEE International Symposium on Network Computing and Applications.
[20] Timothy G. Mattson,et al. Patterns for parallel programming , 2004 .
[21] Philippe Chrétienne. Task scheduling with interprocessor communication delays , 1992 .
[22] Yves Robert,et al. Resource-constrained scheduling of partitioned algorithms on processor arrays , 1995, Proceedings Euromicro Workshop on Parallel and Distributed Processing.
[23] T. Andronikos,et al. Adaptive Cyclic Scheduling of Nested Loops , 2005 .
[24] Larry Carter,et al. Sparse Tiling for Stationary Iterative Methods , 2004, Int. J. High Perform. Comput. Appl..
[25] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[26] Nectarios Koziris,et al. Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays , 2000, IEEE Trans. Parallel Distributed Syst..
[27] William Pugh,et al. The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[28] Sanguthevar Rajasekaran,et al. Online Scheduling of Dynamic Trees , 1995, Parallel Process. Lett..
[29] Jon Feldman,et al. Parallel processor scheduling with delay constraints , 2001, SODA '01.
[30] Nectarios Koziris,et al. Geometric scheduling of 2-D uniform dependence loops , 2001, Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001.
[31] Pierre Ramet,et al. Optimal Grain Size Computation for Pipelined Algorithms , 1996, Euro-Par, Vol. I.
[32] Berna L. Massingill. Patterns for Parallel Application Programs , 1999 .
[33] Nectarios Koziris,et al. Message-passing code generation for non-rectangular tiling transformations , 2006, Parallel Comput..
[34] I. Niven,et al. An introduction to the theory of numbers , 1961 .
[35] Leslie Lamport,et al. The parallel execution of DO loops , 1974, CACM.
[36] Marco Spuri,et al. Implications of Classical Scheduling Results for Real-Time Systems , 1995, Computer.
[37] Theodore Andronikos,et al. Self-Adapting Scheduling for Tasks with Dependencies in Stochastic Environments , 2006, 2006 IEEE International Conference on Cluster Computing.
[38] P. Theodoropoulos,et al. CODE GENERATION FOR GENERAL LOOPS USING METHODS FROM COMPUTATIONAL GEOMETRY , 2004 .
[39] Yves Robert,et al. Resource-constrained scheduling of partitioned algorithms on processor arrays , 1996, Integr..
[40] Jang-Ping Sheu,et al. Partitioning and mapping of nested loops for linear array multicomputers , 1995, The Journal of Supercomputing.
[41] Jeffrey D. Ullman,et al. NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..
[42] Nectarios Koziris,et al. Evaluation of loop grouping methods based on orthogonal projection spaces , 2000, Proceedings 2000 International Conference on Parallel Processing.
[43] C. Q. Lee,et al. The Computer Journal , 1958, Nature.
[44] Edith Schonberg,et al. Factoring: a method for scheduling parallel loops , 1992 .
[45] Yu Hen Hu,et al. A novel modular systolic array architecture for full-search block matching motion estimation , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[46] CONSTANTINE D. POLYCHRONOPOULOS,et al. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.
[47] Dan I. Moldovan,et al. Parallel processing - from applications to systems , 1993 .
[48] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[49] Seth Copen Goldstein,et al. Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.
[50] David K. Lowenthal,et al. Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs , 2000, International Journal of Parallel Programming.
[51] Yves Robert,et al. On the Removal of Anti- and Output-Dependences , 2004, International Journal of Parallel Programming.
[52] David P. Dobkin,et al. The quickhull algorithm for convex hulls , 1996, TOMS.
[53] Joseph H. Silverman,et al. A Friendly Introduction to Number Theory , 1996 .
[54] Theodore Andronikos,et al. Scheduling Nested Loops with the Least Number of Processors , 2003, Applied Informatics.
[55] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[56] Anthony T. Chronopoulos,et al. Optimal synchronization frequency for dynamic pipelined computations on heterogeneous systems , 2007, 2007 IEEE International Conference on Cluster Computing.
[57] Dan I. Moldovan,et al. Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays , 1986, IEEE Transactions on Computers.
[58] Anthony T. Chronopoulos,et al. Studying the impact of synchronization frequency on scheduling tasks with dependencies in heterogeneous systems , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[59] Evangelos P. Markatos,et al. Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Supercomputing '92.
[60] Theodore Andronikos,et al. On parallelization of UET / UET-UCT loops , 2001, Neural Parallel Sci. Comput..
[61] Yves Robert,et al. (Pen)-ultimate tiling? , 1994, Integr..
[62] L.M. Ni,et al. Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers , 1993, IEEE Trans. Parallel Distributed Syst..
[63] Benoit B. Mandelbrot,et al. Fractal Geometry of Nature , 1984 .
[64] Arif Ghafoor,et al. Semi-Distributed Load Balancing For Massively Parallel Multicomputer Systems , 1991, IEEE Trans. Software Eng..
[65] Nectarios Koziris,et al. Optimal Time and Efficient Space Free Scheduling For Nested Loops , 1996, Comput. J..
[66] Oliver Sinnen,et al. Task Scheduling for Parallel Systems , 2007, Wiley series on parallel and distributed computing.
[67] Thomas Kunz,et al. The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme , 1991, IEEE Trans. Software Eng..
[68] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[69] H. Ali,et al. Task Scheduling in Multiprocessing Systems , 1995, Computer.
[70] Shiping Chen,et al. Partitioning and scheduling loops on NOWs , 1999, Comput. Commun..
[71] Jang-Ping Sheu,et al. Partitioning and Mapping Nested Loops on Multiprocessor Systems , 1991, IEEE Trans. Parallel Distributed Syst..
[72] Theodore Andronikos,et al. An Efficient Scheduling of Uniform Dependence Loops , 2003 .
[73] T. Andronikos,et al. Simple Code Generation for special UDLs , 2003 .
[74] Seth Copen Goldstein,et al. Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.
[75] F. H. Mcmahon,et al. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .
[76] Anthony T. Chronopoulos,et al. Dynamic scheduling for dependence loops on heterogeneous clusters , 2006 .
[77] Peter S. Pacheco. Parallel programming with MPI , 1996 .
[78] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[79] Sanjay V. Rajopadhye,et al. Optimal Orthogonal Tiling of 2-D Iterations , 1997, J. Parallel Distributed Comput..
[80] Anthony T. Chronopoulos,et al. Enhancing self-scheduling algorithms via synchronization and weighting , 2008, J. Parallel Distributed Comput..
[81] Jingling Xue,et al. On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..
[82] Nectarios Koziris,et al. Geometric Pattern Prediction and Scheduling of Uniform Dependence Loops , 2001 .
[83] Chung-Ta King,et al. Pipelined Data Parallel Algorithms-II: Design , 1990, IEEE Trans. Parallel Distributed Syst..
[84] Anthony T. Chronopoulos,et al. Dynamic multi phase scheduling for heterogeneous clusters , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.