SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors
暂无分享,去创建一个
Peng Wu | Amarin Phaosawasdi | Christopher I. Rodrigues | Christopher Rodrigues | Peng Wu | Amarin Phaosawasdi
[1] Ayal Zaks,et al. Outer-loop vectorization - revisited for short SIMD architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[2] Albert Cohen,et al. Polyhedral-Model Guided Loop-Nest Auto-Vectorization , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[3] Timothy M. Jones,et al. PSLP: Padded SLP automatic vectorization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[4] David E. Bernholdt,et al. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.
[5] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..
[6] Peng Zhao,et al. An integrated simdization framework using virtual vectors , 2005, ICS '05.
[7] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[8] Guy E. Blelloch,et al. Implementation of a portable nested data-parallel language , 1993, PPOPP '93.
[9] Richard Veras,et al. When polyhedral transformations meet SIMD code generation , 2013, PLDI.
[10] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[11] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[12] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[13] Jaewook Shin,et al. Compiler-controlled caching in superword register files for multimedia extension architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[14] Xinmin Tian,et al. Effective SIMD Vectorization for Intel Xeon Phi Coprocessors , 2015, Sci. Program..
[15] Mahmut T. Kandemir,et al. A compiler framework for extracting superword level parallelism , 2012, PLDI '12.