On-chip cache hierarchy-aware tile scheduling for multicore machines
暂无分享,去创建一个
Mahmut T. Kandemir | Yuanrui Zhang | Wei Ding | Jun Liu | M. Kandemir | W. Ding | Yuanrui Zhang | Jun Liu
[1] William J. Dally,et al. Compilation for explicitly managed memory hierarchies , 2007, PPOPP.
[2] Vivek Sarkar,et al. An analytical model for loop tiling and its solution , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).
[3] Uday Bondhugula,et al. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors , 2009, PPoPP '09.
[4] Vincent Loechner,et al. Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions , 2007, Algorithmica.
[5] Lawrence Rauchwerger,et al. Design and Use of htalib - A Library for Hierarchically Tiled Arrays , 2006, LCPC.
[6] Mahmut T. Kandemir,et al. Compiler algorithms for optimizing locality and parallelism on shared and distributed memory machines , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.
[7] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[8] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[9] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[10] Mahmut T. Kandemir,et al. Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines , 2000, J. Parallel Distributed Comput..
[11] David A. Padua,et al. Hierarchically tiled arrays for parallelism and locality , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[12] Albert Cohen,et al. Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[13] Nectarios Koziris,et al. Selecting the tile shape to reduce the total communication volume , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[14] Sriram Krishnamoorthy,et al. Parametric multi-level tiling of imperfectly nested loops , 2009, ICS.
[15] S. Krishnamoorthy,et al. Affine Transformations for Communication Minimal Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences , 2007 .
[16] Alexander V. Veidenbaum,et al. Cache-aware partitioning of multi-dimensional iteration spaces , 2009, SYSTOR '09.
[17] Cédric Bastoul,et al. Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[18] Jingling Xue,et al. Reuse-Driven Tiling for Improving Data Locality , 1998, International Journal of Parallel Programming.
[19] Refael Hassin,et al. Approximation Algorithms for Minimum K -Cut , 2000, Algorithmica.
[20] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[21] Jim Held. "Single-chip Cloud Computer", an IA Tera-scale Research Processor , 2010, Euro-Par Workshops.
[22] Larry Carter,et al. Selecting tile shape for minimal execution time , 1999, SPAA '99.
[23] J. H. Wilkinson,et al. Handbook for Automatic Computation. Vol II, Linear Algebra , 1973 .
[24] Mahmut T. Kandemir,et al. Cache topology aware computation mapping for multicores , 2010, PLDI '10.
[25] Uday Bondhugula,et al. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model , 2008, CC.
[26] P. Sadayappan,et al. Iteration space tiling for distributed memory machines , 1992 .
[27] Mahmut T. Kandemir,et al. Optimizing shared cache behavior of chip multiprocessors , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[28] Monica S. Lam,et al. An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.
[29] Monica S. Lam,et al. Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.
[30] Mikel Luján,et al. Adaptive Loop Tiling for a Multi-cluster CMP , 2008, ICA3PP.
[31] Monica S. Lam,et al. Data Dependence and Data-Flow Analysis of Arrays , 1992, LCPC.
[32] Max B Aron. The single-chip cloud computer , 2010 .
[33] Alexander Schrijver,et al. Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.
[34] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[35] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[36] David Parello,et al. Facilitating the search for compositions of program transformations , 2005, ICS '05.
[37] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[38] Monica S. Lam,et al. Blocking and array contraction across arbitrarily nested loops using affine partitioning , 2001, PPoPP '01.
[39] Bowen Alpern,et al. Modeling parallel computers as memory hierarchies , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.
[40] Cédric Bastoul,et al. Efficient code generation for automatic parallelization and optimization , 2003, Second International Symposium on Parallel and Distributed Computing, 2003. Proceedings..