Performance of Multi-cores and Multiprocessor Computers for Some 3D Problems of Nonlinear Optics and Gaseous Dynamics

We show that it is significant to take into account the architecture of computer processor and computer platform features to achieve a maximal performance of computer code at parallel computing. With this aim we examine several processor designs, which are used in high-performance computing systems of our faculty. Two problems (SHG—second harmonic generation and laser plume expansion) are chosen as a benchmark. For these problems the optimization technique for a single processor is examined, and the advantages of using the libraries are compared. In some cases the computation reorganization is necessary to take a full advantage of memory hierarchies. Full speedup of computation due to optimizations, suggested at executing in sequential mode of computer code, grows up to 8 times for Intel architectures of computer and up to 5.5 times for IBM architecture of computer.We discuss also using shared memory at parallel computing the SHG problem. We find out the way for overcoming the performance degradation with increasing a number of processors.

[1]  T. G. Elizarova Quasi-Gas Dynamic Equations , 2009 .

[2]  Rajeev Thakur,et al.  The Importance of Non-Data-Communication Overheads in MPI , 2010, HiPC 2010.

[3]  Toshiyuki Imamura,et al.  High-Performance Quantum Simulation for Coupled Josephson Junctions on the Earth Simulator: a Challenge To the Schrödinger Equation On 2564 Grids , 2010, Int. J. High Perform. Comput. Appl..

[4]  Remigijus Paulavičius,et al.  Parallel Branch and Bound Algorithm with Combination of Lipschitz Bounds over Multidimensional Simplices for Multicore Computers , 2009 .

[5]  Satoshi Ashihara,et al.  Soliton compression of femtosecond pulses in quadratic media , 2002 .

[6]  David E. Keyes Partial Differential Equation-Based Applications and Solvers At Extreme Scale , 2009, Int. J. High Perform. Comput. Appl..

[7]  V. N. Lednev,et al.  Evolution of Laser Plume upon Graphite Ablation in Vacuum and Nitrogen , 2005 .

[8]  Francisco Tirado,et al.  Some Aspects About the Scalability of Scientific Applications on Parallel Architectures , 1996, Parallel Comput..

[9]  Nor Asilah Wati Abdul Hamid,et al.  Comparison of MPI Benchmark Programs on Shared Memory and Distributed Memory Machines (Point-to-Point Communication) , 2010, Int. J. High Perform. Comput. Appl..

[10]  Ulrich Drepper,et al.  What Every Programmer Should Know About Memory , 2007 .

[11]  M. Sentis,et al.  Combined continuous–microscopic modeling of laser plume expansion , 2003 .

[12]  Michael Jung,et al.  Parallel Solvers for Nonlinear Elliptic Problems Based on Domain Decomposition Ideas , 1997, Parallel Comput..

[13]  Narayan,et al.  Pulsed-laser evaporation technique for deposition of thin films: Physics and theoretical model. , 1990, Physical review. B, Condensed matter.

[14]  Boniface Nkonga,et al.  Dynamic Load Balancing Computation of Pulses Propagating in a Nonlinear Medium , 2004, The Journal of Supercomputing.

[15]  Ronan Guivarch,et al.  MPI implementation of parallel subdomain methods for linear and nonlinear convection-diffusion problems , 2007, J. Parallel Distributed Comput..

[16]  Andrew S. Tanenbaum,et al.  Structured Computer Organization , 1976 .

[17]  E. O. Brigham,et al.  The Fast Fourier Transform , 1967, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  Vyacheslav A. Trofimov,et al.  The efficiency of application of dual-processor computers for the analysis of the three-dimensional second harmonic generation problem , 2007 .

[19]  A. A. Samarskii,et al.  Numerical Methods for Grid Equations , 2018 .

[20]  Raimondas Čiegis,et al.  A Parallel Solver for the 3D Simulation of Flows Through Oil Filters , 2009 .

[21]  Computer simulation of graphite target ablation under the action of a nanosecond laser pulse , 2008 .