Throttling Automatic Vectorization: When Less is More
暂无分享,去创建一个
[1] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[2] Albert Cohen,et al. Vapor SIMD: Auto-vectorize once, run everywhere , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[3] Peng Wu,et al. Efficient SIMD code generation for runtime alignment and length conversion , 2005, International Symposium on Code Generation and Optimization.
[4] Noah Treuhaft,et al. Scalable Processors in the Billion-Transistor Era: IRAM , 1997, Computer.
[5] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[6] Ken Kennedy,et al. PFC: A Program to Convert Fortran to Parallel Form , 1982 .
[7] Fred Weber,et al. AMD 3DNow! technology: architecture and implementations , 1999, IEEE Micro.
[8] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[9] Richard Henderson,et al. Multi-platform auto-vectorization , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[10] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[11] Ayal Zaks,et al. Outer-loop vectorization - revisited for short SIMD architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[12] Mahmut T. Kandemir,et al. A compiler framework for extracting superword level parallelism , 2012, PLDI '12.
[13] Michael Wolfe. Vector Optimization vs. Vectorization , 1987, ICS.
[14] Jaewook Shin,et al. Superword-level parallelism in the presence of control flow , 2005, International Symposium on Code Generation and Optimization.
[15] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[16] Paul S. Wang,et al. Chains of recurrences—a method to expedite the evaluation of closed-form functions , 1994, ISSAC '94.
[17] David A. Padua,et al. Dependence graphs and compiler optimizations , 1981, POPL '81.
[18] Vivek Sarkar,et al. Efficient Selection of Vector Instructions Using Dynamic Programming , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[19] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[20] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[21] Timothy M. Jones,et al. PSLP: Padded SLP automatic vectorization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[22] Wilfried Oed. Cray Y-MP C90: System features and early benchmark results (Short communication) , 1992, Parallel Comput..
[23] David A. Padua,et al. An Evaluation of Vectorizing Compilers , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[24] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[25] Michael Wolfe,et al. The KAP/S-1 : An Advanced Source-to-Source Vectorizer for the S-1 Mark IIa Supercomputer , 1986, ICPP.
[26] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[27] Ken Kennedy,et al. Automatic translation of FORTRAN programs to vector form , 1987, TOPL.
[28] Scott A. Mahlke,et al. SIMD defragmenter: efficient ILP realization on data-parallel architectures , 2012, ASPLOS XVII.
[29] P. Sadayappan,et al. Dynamic trace-based analysis of vectorization potential of applications , 2012, PLDI.