Characterizing Anomalies of a Multicore ARMv7 Cluster with Parallel N-Body Simulations

ARM processors are beginning to gain attention from the HPC community due to its performance and energy efficiency characteristics. When developing HPC applications for such test beds developers assume that the computation resources available are homogeneous. However, we observed some anomalies when executing a relatively simple HPC application (an NBody simulation). One of the cores in all available nodes presented some variabilities in the computation time. This unexpected behavior was not observed on the second core of each node. In this paper, we aim at characterizing such anomalies, seen in a multicore ARMv7 8-node cluster. We also attempted to isolate and remove all possible interferences that could be contributing to this unexpected behavior, including compilation directives, dynamic processor frequency scaling and communication. Results show that such anomaly might be correlated with the architecture of the dual-core chip. We also analyze the effects of different deployments of MPI process in the total execution time and correlate them to the application and the test bed.

[1]  Drago Zagar,et al.  Towards an energy efficient SoC computing cluster , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[2]  Jean-François Méhaut,et al.  Performance analysis of HPC applications on low-power embedded platforms , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Laxmikant V. Kalé,et al.  Scaling Hierarchical N-body Simulations on GPU Clusters , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Guillermo Marcus Martinez,et al.  Astrophysical particle simulations with large custom GPU clusters on three continents , 2011, Computer Science - Research and Development.

[5]  Pascal Hénon,et al.  On Using an Hybrid MPI-Thread Programming for the Implementation of a Parallel Sparse Direct Solver on a Network of SMP Nodes , 2005, PPAM.

[6]  D. Merritt,et al.  Performance Analysis of Direct N-Body Algorithms on Special-Purpose Supercomputers , 2006, astro-ph/0608125.

[7]  Alejandro Rico,et al.  Tibidabo: Making the case for an ARM-based HPC system , 2014, Future Gener. Comput. Syst..

[8]  Guohong Xu Hydrodynamic and N-body schemes on an unstructured, adaptive mesh with applications to cosmological simulations , 1997 .

[9]  Robert Osserman Kepler's Laws, Newton's Laws, and the Search for New Planets , 2001, Am. Math. Mon..

[10]  Jun Wang,et al.  Molecular dynamics simulations with many-body potentials on multiple GPUs - The implementation, package and performance , 2012, Comput. Phys. Commun..

[11]  Luka Stanisic Towards Modeling and Simulation of Exascale Computing Platforms , 2012 .

[12]  Vincent M. Weaver,et al.  Design and Analysis of a 32-bit Embedded High-Performance Cluster Optimized for Energy and Performance , 2014, 2014 Hardware-Software Co-Design for High Performance Computing.

[13]  Gene M. Amdahl Limits of Expectation , 1988 .

[14]  G. Manfredi,et al.  How to model quantum plasmas , 2005, quant-ph/0505004.

[15]  Alejandro Duran,et al.  Dynamic load balancing of MPI+OpenMP applications , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..