Optimizing Overlapped Memory Accesses in User-directed Vectorization
暂无分享,去创建一个
Alejandro Duran | Sara Royuela | Xavier Martorell | Diego Caballero | Roger Ferrer | X. Martorell | Sara Royuela | A. Duran | R. Ferrer | Diego Caballero
[1] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .
[2] David A. Padua,et al. Advanced compiler optimizations for supercomputers , 1986, CACM.
[3] Jim Jeffers,et al. Chapter 10 – Linux on the Coprocessor , 2013 .
[4] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[5] Nick Knupffer. Intel Corporation , 2018, The Grants Register 2019.
[6] Ayal Zaks,et al. Outer-loop vectorization - revisited for short SIMD architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[7] Leonid Oliker,et al. Impact of modern memory subsystems on cache optimizations for stencil computations , 2005, MSP '05.
[8] John McCutchan,et al. A SIMD programming model for dart, javascript,and other dynamically typed scripting languages , 2014, WPMVP '14.
[9] David A. Padua,et al. An Evaluation of Vectorizing Compilers , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[10] Mauricio Hanzich,et al. 3D seismic imaging through reverse-time migration on homogeneous and heterogeneous multi-core processors , 2009, Sci. Program..
[11] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[13] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[14] Albert Cohen,et al. Polyhedral-Model Guided Loop-Nest Auto-Vectorization , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[15] Aart J. C. Bik. Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance , 2004 .
[16] Pradeep Dubey,et al. Can traditional programming bridge the Ninja performance gap for parallel computing applications? , 2015, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[17] Sebastian Hack,et al. Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[18] Ingo Wald,et al. Extending a C-like language for portable SIMD programming , 2012, PPoPP '12.
[19] Alejandro Duran,et al. Mercurium: Design Decisions for a S2S Compiler , 2011 .
[20] Alejandro Duran,et al. Extending OpenMP* with Vector Constructs for Modern Multicore SIMD Architectures , 2012, IWOMP.
[21] Mauricio Araya-Polo,et al. Algorithm 942 , 2014 .
[22] Emre Kultursay,et al. Compiler-Based Data Prefetching and Streaming Non-temporal Store Generation for the Intel(R) Xeon Phi(TM) Coprocessor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[23] Samuel Williams,et al. Auto-Tuning the 27-point Stencil for Multicore , 2009 .
[24] Lionel Lacassagne,et al. High level transforms for SIMD and low-level computer vision algorithms , 2014, WPMVP '14.
[25] Jaewook Shin,et al. Compiler-controlled caching in superword register files for multimedia extension architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[26] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[27] Ayal Zaks,et al. Vectorizing for a SIMdD DSP architecture , 2003, CASES '03.
[28] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[29] Richard Veras,et al. When polyhedral transformations meet SIMD code generation , 2013, PLDI.
[30] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .
[31] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .