Directive-Based Auto-Tuning for the Finite Difference Method on the Xeon Phi

In this paper, we present a directive-based auto-tuning (AT) framework, called ppOpen-AT, and demonstrate its effect using simulation code based on the Finite Difference Method (FDM). The framework utilizes well-known loop transformation techniques. However, the codes used are carefully designed to minimize the software stack in order to meet the requirements of a many-core architecture currently in operation. The results of evaluations conducted using ppOpen-AT indicate that maximum speedup factors greater than 550% are obtained when it is applied in eight nodes of the Intel Xeon Phi. Further, in the AT for data packing and unpacking, a 49% speedup factor for the whole application is achieved. By using it with strong scaling on 32 nodes in a cluster of the Xeon Phi, we also obtain 24% speedups for the overall execution.

[1]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[2]  John Zahorjan,et al.  Improving the performance of runtime parallelization , 1993, PPOPP '93.

[3]  Cédric Augonnet,et al.  PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems , 2011, IEEE Micro.

[4]  Walter F. Tichy,et al.  Atune-IL: An Instrumentation Language for Auto-tuning Parallel Applications , 2009, Euro-Par.

[5]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[6]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[7]  Chun Chen,et al.  Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.

[8]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  David A. Padua,et al.  SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.

[10]  Takahiro Katagiri,et al.  ABCLibScript: a directive to support specification of an auto-tuning facility for numerical software , 2006, Parallel Comput..

[11]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..

[12]  Richard W. Vuduc,et al.  POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[13]  David A. Padua,et al.  A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.

[14]  Chun Chen,et al.  Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[15]  Ananta Tiwari,et al.  Online Adaptive Code Generation and Tuning , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[16]  Takahiro Katagiri,et al.  Auto-tuning of Computation Kernels from an FDM Code with ppOpen-AT , 2014, 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs.

[17]  Jeffrey S. Vetter,et al.  Autopilot: adaptive control of distributed applications , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[18]  Peter F. Sweeney,et al.  Multiple page size modeling and optimization , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[19]  Takahiro Katagiri,et al.  FIBER: A Generalized Framework for Auto-tuning Software , 2003, ISHPC.

[20]  Takahiro Katagiri,et al.  ABCLib_DRSSED: A parallel eigensolver with an auto-tuning facility , 2006, Parallel Comput..

[21]  Li Chen,et al.  Parallel simulation of strong ground motions during recent and historical damaging earthquakes in Tokyo, Japan , 2005, Parallel Comput..

[22]  Lawrence Rauchwerger,et al.  The R-LRPD test: speculative parallelization of partially parallel loops , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[23]  Satoshi Ito,et al.  Early Experiences for Adaptation of Auto-tuning by ppOpen-AT to an Explicit Method , 2013, 2013 IEEE 7th International Symposium on Embedded Multicore Socs.