Tiling for Dynamic Scheduling
暂无分享,去创建一个
[1] Sartaj Sahni,et al. A blocked all-pairs shortest-paths algorithm , 2003, ACM J. Exp. Algorithmics.
[2] Martin Griebl,et al. Index Set Splitting , 2000, International Journal of Parallel Programming.
[3] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[4] David A. Bader,et al. GTfold: a scalable multicore code for RNA secondary structure prediction , 2009, SAC '09.
[5] Sanjay V. Rajopadhye,et al. Smashing: Folding Space to Tile through Time , 2008, LCPC.
[6] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[7] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[8] Martin Griebl,et al. Forward Communication Only Placements and Their Use for Parallel Program Construction , 2002, LCPC.
[9] Uday Bondhugula,et al. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors , 2009, PPoPP '09.
[10] Monica S. Lam,et al. Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.
[11] Albert Cohen,et al. Split tiling for GPUs: automatic parallelization using trapezoidal tiles , 2013, GPGPU@ASPLOS.
[12] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[13] Peter R. Cappello,et al. Converting affine recurrence equations to quasi-uniform recurrence equations , 1995, J. VLSI Signal Process..
[14] Dominique Lavenier,et al. GPU Accelerated RNA Folding Algorithm , 2009, ICCS.
[15] Michael Zuker,et al. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..
[16] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[17] Martin Griebl,et al. Automatic Parallelization of Loop Programs for Distributed Memory Architectures , 2004 .
[18] Robert A. van de Geijn,et al. Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures , 2007, SPAA '07.
[19] Guy E. Blelloch,et al. Provably good multicore cache performance for divide-and-conquer algorithms , 2008, SODA '08.
[20] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[21] Bowen Alpern,et al. A model for hierarchical memory , 1987, STOC.
[22] Vijaya Ramachandran,et al. Cache-efficient dynamic programming algorithms for multicores , 2008, SPAA '08.
[23] William Pugh,et al. Iteration space slicing and its application to communication optimization , 1997, ICS '97.
[24] Uday Bondhugula,et al. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model , 2008, CC.
[25] Corinne Ancourt,et al. Scanning polyhedra with DO loops , 1991, PPOPP '91.
[26] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.