论文信息 - GRAPE-MP: An SIMD Accelerator Board for Multi-precision Arithmetic

GRAPE-MP: An SIMD Accelerator Board for Multi-precision Arithmetic

Abstract We describe the design and performance of the GRAPE-MP board, an SIMD accelerator board for quadrupleprecision arithmetic operations. A GRAPE-MP board houses one GRAPE-MP processor chip and an FPGA chip which handles the communication with the host computer. A GRAPE-MP chip has 6 processing elements (PE) and operates with 100 MHz clock cycle. Each PE can perform one addition and one multiplication in every clock cycle. The architecture of the GRAPE-MP is similar to that of the GRAPE-DR. It is implemented using the structured ASIC chip from eASIC corp. A GRAPE-MP processor board has the theoretical peak quadruple-precision performance of 1.2 Gflops. As a preliminary result, we present the performance of the GRAPE-MP board for two target applications. The performance of the numerical integration of Feynman loop is 0.53 Gflops. The performance of a N-body simulation with the second order leapfrog schema is 0.505 Gflops for N = 1984, which is more than 10 times faster than the performance of the host computer.

[1] 京都大学数理解析研究所,et al. Publications of the Research Institute for Mathematical Sciences , 1965 .

[2] T. J. Dekker,et al. A floating-point technique for extending the available precision , 1971 .

[3] Naohito Nakasato,et al. A compiler for high performance computing with many-core accelerators , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[4] P. Wynn,et al. On the Convergence and Stability of the Epsilon Algorithm , 1966 .

[5] M. Mori. Discovery of the Double Exponential Transformation and Its Developments , 2005 .

[6] Thomas L. Sterling,et al. Pentium Pro Inside: I. A Treecode at 430 Gigaflops on ASCI Red, II. Price/Performance of $50/Mflop on Loki and Hyglac , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[7] Naohito Nakasato,et al. Application of Many-core Accelerators for Problems in Astronomy and Physics , 2011 .

[8] J. Fujimoto,et al. Precise Numerical Evaluation of the Scalar One-Loop Integrals with the Infrared Divergence , 2007, 0709.0777.

[9] Toshiyuki Fukushige,et al. GRAPE-6: Massively-Parallel Special-Purpose Computer for Astrophysical Particle Simulations , 2003, astro-ph/0310702.

[10] J. Makino,et al. GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems , 2005, astro-ph/0504407.

[11] Xiaoye S. Li,et al. Algorithms for quad-double precision floating point arithmetic , 2000, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[12] Donald E. Knuth,et al. The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .

[13] Robert D. Mathieu,et al. Standardised units and time scales , 1986 .

[14] Tsuyoshi Hamada,et al. 190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[15] Donald E. Knuth,et al. The Art of Computer Programming, Vol. 2 , 1981 .