A programming system for xeon phis with runtime SIMD parallelization
暂无分享,去创建一个
[1] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Richard Veras,et al. When polyhedral transformations meet SIMD code generation , 2013, PLDI.
[3] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.
[4] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[5] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Vipin Kumar,et al. Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.
[7] Bingsheng He,et al. Optimizing the MapReduce framework on Intel Xeon Phi coprocessor , 2013, 2013 IEEE International Conference on Big Data.
[8] Pradeep Dubey,et al. PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors , 2011, Proc. VLDB Endow..
[9] Gagan Agrawal,et al. Scheduling Methods for Accelerating Applications on Architectures with Heterogeneous Cores , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[10] Seonggun Kim,et al. Efficient SIMD code generation for irregular kernels , 2012, PPoPP '12.
[11] Kevin Skadron,et al. A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations , 2011, International Journal of Parallel Programming.
[12] James R. Larus,et al. SIMD parallelization of applications that traverse irregular data structures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[13] Milind Girkar,et al. Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[14] Stephen A. Jarvis,et al. Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi Coprocessors , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[15] Bo Wu,et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.
[16] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[17] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[18] Siegfried Benkner,et al. Efficient Hybrid Execution of C++ Applications using Intel(R) Xeon Phi(TM) Coprocessor , 2012, ArXiv.
[19] Gagan Agrawal,et al. An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs , 2011, ICS '11.
[20] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .
[21] Vipin Kumar,et al. Introduction to Data Mining, (First Edition) , 2005 .
[22] M. Pharr,et al. ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).
[23] Samuel Williams,et al. Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[24] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[25] Richard Veras,et al. A stencil compiler for short-vector SIMD architectures , 2013, ICS '13.
[26] Franz Franchetti,et al. Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.
[27] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[28] Pradeep Dubey,et al. FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.