Automatic SIMD Vectorization of SSA-based Control Flow Graphs
暂无分享,去创建一个
[1] R. Govindarajan,et al. A Vectorizing Compiler for Multimedia Extensions , 2000, International Journal of Parallel Programming.
[2] Yooseong Kim,et al. CuMAPz: A tool to analyze memory access patterns in CUDA , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).
[3] Stavros Tripakis,et al. Checking Equivalence of SPMD Programs Using Non- Interference , 2010 .
[4] Mike Murphy,et al. Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs , 2010, CGO '10.
[5] Guodong Li,et al. Scalable SMT-based verification of GPU kernel functions , 2010, FSE '10.
[6] Randolph G. Scarborough,et al. A Vectorizing Fortran Compiler , 1986, IBM J. Res. Dev..
[7] M. Schlansker,et al. On Predicated Execution , 1991 .
[8] Henk Corporaal,et al. Making graphs reducible with controlled node splitting , 1997, TOPL.
[9] Marc Olano. Modified noise for evaluation on graphics hardware , 2005, HWWS '05.
[10] Viet Nhu Ngo. Parallel loop transformation techniques for vector-based multiprocessor systems , 1995 .
[11] Alejandro Duran,et al. Extending OpenMP* with Vector Constructs for Modern Multicore SIMD Architectures , 2012, IWOMP.
[12] Richard Henderson,et al. Multi-platform auto-vectorization , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[13] Scott A. Mahlke,et al. SIMD defragmenter: efficient ILP realization on data-parallel architectures , 2012, ASPLOS XVII.
[14] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[15] Mattan Erez,et al. Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation , 2013, ISCA.
[16] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[17] G. Ramalingam,et al. On loops, dominators, and dominance frontiers , 2002, TOPL.
[18] Sebastian Hack,et al. Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[19] Guodong Li,et al. Performance Degradation Analysis of GPU Kernels , 2012 .
[20] Denis Barthou,et al. On the decidability of phase ordering problem in optimizing compilation , 2006, CF '06.
[21] Sebastian Hack,et al. Sierra: a SIMD extension for C++ , 2014, WPMVP '14.
[22] Philipp Slusallek,et al. AnySL: efficient and portable shading for ray tracing , 2010, HPG '10.
[23] Michael D. McCool,et al. Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[24] Jaewook Shin. Introducing Control Flow into Vectorized Code , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[25] Francisco Vázquez,et al. A new approach for sparse matrix vector product on NVIDIA GPUs , 2011, Concurr. Comput. Pract. Exp..
[26] Yi Yang,et al. A unified optimizing compiler framework for different GPGPU architectures , 2012, TACO.
[27] Thomas Sturm,et al. Presburger Arithmetic in Memory Access Optimization for Data-Parallel Languages , 2013, FroCos.
[28] S. Boulos,et al. RTSL: a Ray Tracing Shading Language , 2007, 2007 IEEE Symposium on Interactive Ray Tracing.
[29] Scott A. Mahlke,et al. MacroSS: macro-SIMDization of streaming applications , 2010, ASPLOS XV.
[30] Robert E. Tarjan,et al. A fast algorithm for finding dominators in a flowgraph , 1979, TOPL.
[31] Ken Perlin,et al. Improving noise , 2002, SIGGRAPH.
[32] Milind Girkar,et al. Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[33] Stamatis Vassiliadis,et al. Performance Impact of Misaligned Accesses in SIMD Extensions , 2006 .
[34] Ingo Wald,et al. Extending a C-like language for portable SIMD programming , 2012, PPoPP '12.
[35] Jaewook Shin,et al. Superword-level parallelism in the presence of control flow , 2005, International Symposium on Code Generation and Optimization.
[36] Krste Asanovic,et al. Convergence and scalarization for data-parallel architectures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[37] Jim X. Chen,et al. OpenGL Shading Language , 2009 .
[38] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[39] Edward S. Lowry,et al. Object code optimization , 1969, CACM.
[40] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[41] Des Watson,et al. A study of irreducibility in C programs , 2012, Softw. Pract. Exp..
[42] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[43] Seonggun Kim,et al. Efficient SIMD code generation for irregular kernels , 2012, PPoPP '12.
[44] Michael F. P. O'Boyle,et al. A large-scale cross-architecture evaluation of thread-coarsening , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[45] Jarmo Takala,et al. OpenCL-based design methodology for application-specific processors , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.
[46] Ayal Zaks,et al. Outer-loop vectorization - revisited for short SIMD architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[47] Ken Perlin,et al. [Computer Graphics]: Three-Dimensional Graphics and Realism , 2022 .
[48] Hye-Sun Kim,et al. Cache-oblivious ray reordering , 2010, TOGS.
[49] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[50] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[51] Volker Weispfenning. The Complexity of Almost Linear Diophantine Problems , 1990, J. Symb. Comput..
[52] Yosi Ben-Asher,et al. Block Unification IF-conversion for High Performance Architectures , 2014, IEEE Computer Architecture Letters.
[53] Sebastian Hack,et al. Improving Performance of OpenCL on CPUs , 2012, CC.
[54] Fernando Magno Quintão Pereira,et al. Divergence analysis , 2013, ACM Trans. Program. Lang. Syst..
[55] M. Pharr,et al. ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).
[56] Ingo Wald. Active thread compaction for GPU path tracing , 2011, HPG '11.
[57] Andreas Krall,et al. Compilation Techniques for Multimedia Processors , 2004, International Journal of Parallel Programming.
[58] Sid Touati,et al. The Speedup Test , 2010 .
[59] Wen-mei W. Hwu,et al. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs , 2008, LCPC.
[60] John Sartori,et al. Branch and Data Herding: Reducing Control and Memory Divergence for Error-Tolerant GPU Applications , 2013, IEEE Trans. Multim..
[61] Volker Lindenstruth,et al. Vc: A C++ library for explicit vectorization , 2012, Softw. Pract. Exp..