Supernode transformation on GPGPUs
暂无分享,去创建一个
[1] Keshav Pingali,et al. Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests , 2001, International Journal of Parallel Programming.
[2] Nectarios Koziris,et al. Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs Using Memory Mapped Network Interfaces , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[3] Boleslaw K. Szymanski,et al. Finding Optimum Wavefront of Parallel Computation , 1994, Parallel Algorithms Appl..
[4] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[5] Hiroshi Imai,et al. Parallel Multiple Alignments and Their Implementation on CM5 , 1993 .
[6] Jack Dongarra,et al. Tiling for Heterogeneous Computing Platforms , 2006 .
[7] Weijia Shang,et al. On Supernode Transformation with Minimized Total Running Time , 1998, IEEE Trans. Parallel Distributed Syst..
[8] A. Jeffrey. Complex Analysis and Applications , 1991 .
[9] Yves Robert,et al. Tiling with limited resources , 1997, Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors.
[10] Weijia Shang,et al. On Time Optimal Supernode Shape , 2002, IEEE Trans. Parallel Distributed Syst..
[11] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[12] Nectarios Koziris,et al. Scheduling of tiled nested loops onto a cluster with a fixed number of SMP nodes , 2004, 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings..
[13] Monica S. Lam,et al. Blocking and array contraction across arbitrarily nested loops using affine partitioning , 2001, PPoPP '01.
[14] David Parello,et al. Facilitating the search for compositions of program transformations , 2005, ICS '05.
[15] Daniel S. Hirschberg,et al. A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.
[16] Hiroshi Ohta,et al. Optimal tile size adjustment in compiling general DOACROSS loop nests , 1995, ICS '95.
[17] Weijia Shang,et al. Time Optimal Linear Schedules for Algorithms with Uniform Dependencies , 1991, IEEE Trans. Computers.
[18] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[19] Jiaoyun Yang,et al. An Efficient Parallel Algorithm for Longest Common Subsequence Problem on GPUs , 2010 .
[20] Monica S. Lam,et al. An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.
[21] Nectarios Koziris,et al. Minimizing completion time for loop tiling with computation and communication overlapping , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[22] David Parello,et al. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.
[23] Paul Feautrier,et al. Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.