Performance modeling of distributed hybrid architectures

Hybrid architectures are systems where a high performance general purpose computer is coupled to one or more special purpose devices (SPDs). Such a system can be the optimal choice for several fields of computational science. Configuring the system and finding the optimal mapping of the application tasks onto the hybrid machine often is not straightforward. Performance modeling is a tool to tackle and solve these problems. We have developed a performance model to simulate the behavior of a hybrid architecture consisting of a parallel multiprocessor where some nodes are the host of a GRAPE board. GRAPE is a very high performance SPD used in computational astrophysics. We validate our model on the architecture at our disposal, and show examples of predictions that our model can produce.

[1]  Toshiyuki Fukushige,et al.  A 29.5 Tflops Simulation of Planetesimals in Uranus-Neptune Region on GRAPE-6 , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[2]  R. F. Freund,et al.  Guest Editor's Introduction: Heterogeneous Processing , 1993 .

[3]  Masaki Koga,et al.  A 1.349 Tflops simulation of black holes in a galactic center on GRAPE-6 , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[4]  Peter M. A. Sloot,et al.  The distributed ASCI Supercomputer project , 2000, OPSR.

[5]  V. Springel,et al.  GADGET: a code for collisionless and gasdynamical cosmological simulations , 2000, astro-ph/0003162.

[6]  Arjan J. C. van Gemund,et al.  Performance prediction of parallel processing systems: the PAMELA methodology , 1993, ICS '93.

[7]  Walter Dehnen,et al.  A Hierarchical O(N) Force Calculation Algorithm , 2002 .

[8]  Junichiro Makino,et al.  Treecode with a Special-Purpose Processor , 1991 .

[9]  Hiroshi Nakamura,et al.  Performance of lattice QCD programs on CP-PACS , 1999, Parallel Computing.

[10]  S. Aarseth Direct methods for N-Body simulations , 1994 .

[11]  Claudio Gennaro,et al.  Integrated Performance Models for SPMD Applications and MIMD Architectures , 2002, IEEE Trans. Parallel Distributed Syst..

[12]  Atsushi Kawai,et al.  $7.0/Mflops Astrophysical N-Body Simulation with Treecode on GRAPE-5 , 1999, SC.

[13]  L. Spitzer Dynamical evolution of globular clusters , 1987 .

[14]  Robert D. Mawhinney The 1 Teraflops QCDSP computer , 1999, Parallel Comput..

[15]  Vittorio Rosato,et al.  Heterogeneity as key feature of high performance computing: the PQE1 prototype , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[16]  Ed F. Deprettere,et al.  Exploring Embedded-Systems Architectures with Artemis , 2001, Computer.

[17]  Neil J. Gunther,et al.  The Dynamics of Performance Collapse in Large-Scale Networks and Computers , 2000, Int. J. High Perform. Comput. Appl..

[18]  Satoshi Matsuoka,et al.  Performance Evaluation Model for Scheduling in Global Computing Systems , 2000, Int. J. High Perform. Comput. Appl..

[19]  Michael S. Warren,et al.  A portable parallel particle program , 1995 .

[20]  Raffaele Tripiccione APEmille , 1999, Parallel Comput..

[21]  Toshikazu Ebisuzaki,et al.  GRAPE-4: A Massively Parallel Special-Purpose Computer for Collisional N-Body Simulations , 1997 .

[22]  Makoto Taiji,et al.  Astrophysical N-body simulations on the GRAPE-4 Special-Purpose Computer , 1995, SC.

[23]  Douglas C. Heggie The gravitational million-body problem , 2001 .

[24]  Junichiro Makino Yet Another Fast Multipole Method without Multipoles-Pseudoparticle Multipole Method , 1999 .

[25]  Junichiro Makino,et al.  Time-Symmetrized Kustaanheimo-Stiefel Regularization , 1996 .

[26]  Adolfy Hoisie,et al.  Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..

[27]  Peter M. A. Sloot,et al.  Performance Analysis of Parallel N-Body Codes , 2000, HPCN Europe.

[28]  Junichiro Makino,et al.  A Modified Aarseth Code for GRAPE and Vector Processors , 1991 .

[29]  R W Hockney,et al.  Computer Simulation Using Particles , 1966 .

[30]  Atsushi Kawai,et al.  Pseudoparticle Multipole Method: A Simple Method to Implement a High-Accuracy Tree Code , 2000, astro-ph/0012041.

[31]  Peter M. A. Sloot,et al.  A Versatile Simulation Model for Hierarchical Treecodes , 2002, International Conference on Computational Science.

[32]  Mineo Takai,et al.  Parssec: A Parallel Simulation Environment for Complex Systems , 1998, Computer.

[33]  Toshiyuki Fukushige,et al.  N-Boday Simulation of Galaxy Formation on GRAPE-4 Special-Purpose Computer , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[34]  Peter M. A. Sloot,et al.  Performance prediction of N-body simulations on a hybrid architecture , 2001, FME 2001.

[35]  L. Greengard,et al.  Regular Article: A Fast Adaptive Multipole Algorithm in Three Dimensions , 1999 .

[36]  Stephen L. W. McMillan,et al.  An O(N log N) integration scheme for collisional stellar systems , 1993 .

[37]  Rizos Sakellariou,et al.  Application Representations for Multiparadigm Performance Modeling of Large-Scale Parallel Scientific Codes , 2000, Int. J. High Perform. Comput. Appl..

[38]  Marios D. Dikaiakos,et al.  Functional Algorithm Simulation of the Fast Multipole Method: Architectural Implications , 1996, Parallel Process. Lett..

[39]  Makoto Taiji,et al.  Scientific simulations with special purpose computers - the GRAPE systems , 1998 .

[40]  A. Eddington,et al.  The Distribution of Stars in Globular Clusters , 1916 .

[41]  Joel H. Saltz,et al.  Efficient Performance Prediction for Large-Scale, Data-Intensive Applications , 2000, Int. J. High Perform. Comput. Appl..

[42]  Mary K. Vernon,et al.  Poems: end-to-end performance design of large parallel adaptive computational systems , 1998, WOSP '98.

[43]  Arjan J. C. van Gemund,et al.  Symbolic Performance Modeling of Parallel Systems , 2003, IEEE Trans. Parallel Distributed Syst..