Intel Knights Corner的结点级内存访问优化 (Node-level Memory Access Optimization on Intel Knights Corner)

Traditional Programming Optimization (TPO) has limited effects on Intel Knights Corner (KNC). Therefore, we propose Memory Access Optimization (MAO) for KNC. We applied MAO to TPO version of Diffusion 3D, and improve the performance 39.1%. We made two contributions in this paper: we believe 1) MAO is indispensable to KNC and TPO+MAO is the path to Ninja Performance, the best optimized performance; 2) Intrinsic-based MAO is more efficient to stencil code than compiler-based MAO. Our findings on MAO will inspire large-scale applications optimizations on KNC.

[1]  Emre Kultursay,et al.  Compiler-Based Data Prefetching and Streaming Non-temporal Store Generation for the Intel(R) Xeon Phi(TM) Coprocessor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[2]  Pradeep Dubey,et al.  Can traditional programming bridge the Ninja performance gap for parallel computing applications? , 2015, 2012 39th Annual International Symposium on Computer Architecture (ISCA).