Reactive tiling
暂无分享,去创建一个
[1] Per Stenström,et al. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[2] Ling Shao,et al. DMATiler: Revisiting loop tiling for direct memory access , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[3] Monica S. Lam,et al. An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.
[4] James Demmel,et al. Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..
[5] Jichuan Chang,et al. Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.
[6] S. Kim,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[7] A. Snavely,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.
[8] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[9] Rudolf Eigenmann,et al. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.
[10] Mahmut T. Kandemir,et al. Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[11] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[12] J. Ramanujam,et al. Adaptive parallel tiled code generation and accelerated auto-tuning , 2013, Int. J. High Perform. Comput. Appl..
[13] Xing Zhou,et al. Hierarchical overlapped tiling , 2012, CGO '12.
[14] Corinne Ancourt,et al. Scanning polyhedra with DO loops , 1991, PPOPP '91.
[15] Chen Ding,et al. Defensive loop tiling for shared cache , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[16] G. Edward Suh,et al. Dynamic Cache Partitioning for Simultaneous Multithreading Systems , 2004 .
[17] Vivek Sarkar,et al. An analytical model for loop tiling and its solution , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).
[18] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[19] Hui Wu,et al. Parallelizing SOR for GPGPUs using alternate loop tiling , 2012, Parallel Comput..
[20] G. Edward Suh,et al. Analytical cache models with applications to cache partitioning , 2001, ICS '01.
[21] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[22] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[23] J. Ramanujam,et al. Parameterized tiling revisited , 2010, CGO '10.
[24] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[25] Yan Solihin,et al. Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.
[26] Srihari Makineni,et al. Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[27] Sriram Krishnamoorthy,et al. Parametric multi-level tiling of imperfectly nested loops , 2009, ICS.
[28] John Turek,et al. Optimal Partitioning of Cache Memory , 1992, IEEE Trans. Computers.
[29] Engin Ipek,et al. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[30] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[31] Mikel Luján,et al. Adaptive Loop Tiling for a Multi-cluster CMP , 2008, ICA3PP.
[32] Dimitrios S. Nikolopoulos. Dynamic tiling for effective use of shared caches on multithreaded processors , 2004, Int. J. High Perform. Comput. Netw..
[33] Cédric Bastoul,et al. Switchable Scheduling for Runtime Adaptation of Optimization , 2014, Euro-Par.