Automatic OpenCL work-group size selection for multicore CPUs
暂无分享,去创建一个
Jaejin Lee | Sangmin Seo | Jun Lee | Gangwon Jo | Jun Lee | Jaejin Lee | Gangwon Jo | Sangmin Seo
[1] Tomofumi Yuki,et al. Automatic creation of tile size selection models , 2010, CGO '10.
[2] Jaejin Lee,et al. Performance characterization of the NAS Parallel Benchmarks in OpenCL , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[3] Jungwon Kim,et al. An OpenCL Framework for Homogeneous Manycores with No Hardware Cache Coherence , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[4] Jungwon Kim,et al. SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters , 2012, ICS '12.
[5] Sanjay V. Rajopadhye,et al. Positivity, posynomials and tile size selection , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Jong-Deok Choi,et al. An OpenCL framework for heterogeneous multicores with local memory , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[7] Hiroaki Kobayashi,et al. Automatic Tuning of CUDA Execution Parameters for Stencil Processing , 2010, Software Automatic Tuning, From Concepts to State-of-the-Art Results.
[8] Vivek Sarkar,et al. Analytical Bounds for Optimal Tile Size Selection , 2012, CC.
[9] Chen Ding,et al. Linear-time Modeling of Program Working Set in Shared Cache , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[10] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[11] J. Xu. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .
[12] Xipeng Shen,et al. Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[13] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[14] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[15] Erik Hagersten,et al. Fast data-locality profiling of native execution , 2005, SIGMETRICS '05.
[16] Chen Ding,et al. All-window profiling and composable models of cache sharing , 2011, PPoPP '11.
[17] Bixia Zheng,et al. Twin Peaks: A Software Platform for Heterogeneous Computing on General-Purpose and Graphics Processors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[18] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[19] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[20] Jacqueline Chame,et al. A tile selection algorithm for data locality and cache interference , 1999, ICS '99.
[21] Mike Murphy,et al. Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs , 2010, CGO '10.
[22] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[23] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[24] Chau-Wen Tseng,et al. A Comparison of Compiler Tiling Algorithms , 1999, CC.
[25] Vincent Loechner,et al. Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions , 2007, Algorithmica.
[26] Ulrich Kremer,et al. A Quantitative Analysis of Tile Size Selection Algorithms , 2004, The Journal of Supercomputing.
[27] Sudhakar Yalamanchili,et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).