Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus
暂无分享,去创建一个
[1] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[2] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .
[3] Przemyslaw Stpiczynski,et al. Efficient Language-Based Parallelization of Computational Problems Using Cilk Plus , 2017, PPAM.
[4] Ami Marowka. Parallel computing on any desktop , 2007, CACM.
[5] Rohit Chandra,et al. Parallel programming in openMP , 2000 .
[6] Arch D. Robison,et al. Composable Parallel Patterns with Intel Cilk Plus , 2013, Computing in Science & Engineering.
[7] A. Leist,et al. A Comparative Analysis of Parallel Programming Models for C , 2014 .
[8] Ami Marowka. TBBench: A Micro-Benchmark Suite for Intel Threading Building Blocks , 2012, J. Inf. Process. Syst..
[9] Andrey Semin,et al. Optimizing HPC Applications with Intel® Cluster Tools , 2014, Apress.
[10] Rezaur Rahman,et al. Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers , 2013 .
[11] Rezaur Rahman. Intel® Xeon Phi™ Coprocessor Architecture and Tools , 2013, Apress.
[12] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .
[13] Christian Terboven,et al. Using OpenMP - The Next Step: Affinity, Accelerators, Tasking, and SIMD , 2017, Using OpenMP - The Next Step.
[14] Avinash Sodani,et al. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .
[15] James N. Lyness,et al. Notes on the Adaptive Simpson Quadrature Routine , 1969, J. ACM.
[16] Przemyslaw Stpiczynski. Semiautomatic Acceleration of Sparse Matrix-Vector Product Using OpenACC , 2015, PPAM.