A Scalable Infrastructure for Online Performance Analysis on CFD Application

Abstract The fast-growing demand of computational fluid dynamics (CFD) application for computing resources stimulates the development of high performance computing (HPC) and meanwhile raises new requirements for the technology of parallel application performance monitor and analysis. In response to large-scale and long-time running for the application of CFD, online and scalable performance analysis technology is required to optimize the parallel programs as well as to improve their operational efficiency. As a result, this research implements a scalable infrastructure for online performance analysis on CFD application with homogeneous or heterogeneous system. The infrastructure is part of the parallel application performance monitor and analysis system (PAPMAS) and is composed of two modules which are scalable data transmission module and data storage module. The paper analyzes and elaborates this infrastructure in detail with respect to its design and implementation. Furthermore, some experiments are carried out to verify the rationality and high efficiency of this infrastructure that could be adopted to meet the practical needs.

[1]  Allen D. Malony,et al.  Knowledge support and automation for performance analysis with PerfExplorer 2.0 , 2008, Sci. Program..

[2]  Makoto Tsubokura,et al.  High-performance computing and visualization of unsteady turbulent flows , 2008, J. Vis..

[3]  David E. Keyes,et al.  Prospects for CFD on Petaflops Systems , 1997 .

[4]  Paul G. Spirakis,et al.  BSP vs LogP , 1996, SPAA '96.

[5]  Allen D. Malony,et al.  The TAU Parallel Performance System 2 Corresponding Author : , 2005 .

[6]  Fayez Gebali,et al.  Algorithms and Parallel Computing , 2011 .

[7]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[8]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[9]  Barton P. Miller,et al.  Reliable, scalable tree-based overlay networks , 2008 .

[10]  Jack Dongarra,et al.  Scientific Computing with Multicore and Accelerators , 2010, Chapman and Hall / CRC computational science series.

[11]  Allen D. Malony,et al.  TAUoverSupermon : Low-Overhead Online Parallel Performance Monitoring , 2007, Euro-Par.

[12]  Nicolas Gourdain,et al.  High performance parallel computing of flows in complex geometries: I. Methods , 2009 .

[13]  Allen D. Malony,et al.  PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[14]  Richard M. Karp,et al.  Optimal broadcast and summation in the LogP model , 1993, SPAA '93.

[15]  B.P. Miller,et al.  MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[16]  Allen D. Malony,et al.  Design and implementation of a parallel performance data management framework , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[17]  Barton P. Miller,et al.  On-line automated performance diagnosis on thousands of processes , 2006, PPoPP '06.

[18]  Allen D. Malony,et al.  TAUmon: Scalable Online Performance Data Analysis in TAU , 2010, Euro-Par Workshops.

[19]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[20]  Martin Schulz,et al.  Scalable dynamic binary instrumentation for Blue Gene/L , 2005, CARN.

[21]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[22]  Nicolas Gourdain,et al.  High performance parallel computing of flows in complex geometries , 2011 .

[23]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[24]  Barton P. Miller,et al.  Tree-based overlay networks for scalable applications , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[25]  Makoto Tsubokura,et al.  Computational visualization of unsteady flow around vehicles using high performance computing , 2009 .

[26]  Li Li,et al.  Recent Efforts for Credible CFD Simulations in China , 2011 .

[27]  Li Yucheng,et al.  Performance analysis of NPB benchmark on domestic tera-scale cluster systems , 2005 .

[28]  Allen D. Malony,et al.  Knowledge support and automation for performance analysis with PerfExplorer 2.0 , 2008 .

[29]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[30]  Barton P. Miller,et al.  A framework for scalable, parallel performance monitoring , 2010, Concurr. Comput. Pract. Exp..

[31]  Matthias S. Müller,et al.  Developing Scalable Applications with Vampir, VampirServer and VampirTrace , 2007, PARCO.

[32]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[33]  Michele Weiland,et al.  Performance Tuning of Scientific Applications on HPC Systems , 2010 .