High performance computing and simulation: architectures, systems, algorithms, technologies, services, and applications

In many areas, high-performance computing (HPC) and simulation have become determinants of industrial competitiveness and advanced research. Following the progress in aerospace, automobile, environmental, energy, healthcare, and networking industries, most research domains nowadays measure the strategic importance of their developments vis-á-vis the mastering of these critical technologies. Intensive computing and numerical simulation are now essential tools that contribute to the success in systems designs, effectiveness of public policies such as prevention of natural hazards and taking account of climate risks, but also to security and national sovereignty. Yet, the fact that simulation is employed by a large number of users does not mean that they all contribute equally to the advancement of this science. It is widely anticipated that the continual progress and investment in HPC and simulation will bring about innovations and technologies that will contribute to the growth and evolution in all major scientific domains. For instance, the simulation of complex phenomena, such as biological and living components, will lead to spectacular scientific breakthroughs. In terms of hardware and software architectures, we can expect exaflopic performances [1] to be reached before 2020. Exascale computing is however an inspiring challenge, implying difficult but invigorating technological obstacles. The arrival of General Purpose-Graphical Processing Units (GP-GPU) has impacted the pace of improvements in peak performances. However, this development implies rethinking the use of such architectures to obtain maximum performance or peak return whenever possible. In some cases, these technologies will require significant efforts to adapt them to existing applications. At the same time, they will also impact the design of future applications. Furthermore, they will require acquiring and building new tools and infrastructure [2–5]. HPC has so far been a laboratory for the development of techniques, technologies, services, and applications that sooner or later will end up in future consumer desktop computers. Nowadays, desktops and laptops have vector processing capabilities, with Streaming SIMD Extensions (SSE) instructions, similar to what Cray proposed in the seventies (Advanced Vector Extensions (AVX) are also now available since the introduction of Intel’s Sandy Bridge processor). Equally, the introduction of the personal ‘super-computer’ in 2008 with NVIDIA’s Tesla boards (1 Teraflop single precision) changed the way we think about HPC [6]. Such components have been introduced in the design of supercomputers and clusters [7]. At the time of the High Performance Computing and Simulation (HPCS) 2010 conference, three of the first five supercomputers ranked in the ‘top500’ [8] were hybrid, some with Tesla boards and others with the Fermi architecture, which considerably improved double precision performances [9,10]. At the time of writing this editorial, double Graphical Processing Units (GPUs) with thousands of cores are available. An IBM BlueGene/Q system named Sequoia has been recently installed at the Department of Energy’s Lawrence Livermore National Laboratory. This supercomputer achieved 16.32 petaflop/s on the Linpack benchmark using 1,572,864 cores. It is also one of the most energy efficient systems in the Top500 list. For next year, supercomputers are expected to be more energy efficient while surpassing the 20 petaflops milestone, and we anticipate even higher peak performances and efficiencies in the subsequent years. In addition, the introduction in 2009 of Intel’s Sandy Bridge [11] and AMD’s Accelerated Processing Unit (APU) [12] will also impact the way we will design and program these new parallel architectures. These exciting developments are also a challenge we will have to deal with. Generalist multicore architectures will arrive around 2013 with the commercial availability of the Intel Many Integrated Core (MIC) architecture [13, 14] (Xeon Phi is the final name retained by Intel for the commercialization of this

[1]  Luca Benini,et al.  SIMinG‐1k: A thousand‐core simulator running on general‐purpose graphical processing units , 2013, Concurr. Comput. Pract. Exp..

[2]  Michael Klemm,et al.  From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture , 2012, Computing in Science & Engineering.

[3]  Naohito Nakasato,et al.  A compiler for high performance computing with many-core accelerators , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[4]  Schahram Dustdar,et al.  Cloud resource provisioning and SLA enforcement via LoM2HiS framework , 2013, Concurr. Comput. Pract. Exp..

[5]  Mamadou Kaba Traoré,et al.  Distribution of random streams for simulation practitioners , 2013, Concurr. Comput. Pract. Exp..

[6]  Sid Ahmed Ali Touati,et al.  The Speedup‐Test: a statistical methodology for programme speedup analysis and computation , 2013, Concurr. Comput. Pract. Exp..

[7]  Luca Benini,et al.  SIM in G-1 k : A Thousand-Core Simulator running on GPGPUs , 2012 .

[8]  Paulo F. Flores,et al.  Configurable and scalable class of high performance hardware accelerators for simultaneous DNA sequence alignment , 2013, Concurr. Comput. Pract. Exp..

[9]  Orion S. Lawlor,et al.  Message passing for GPGPU clusters: CudaMPI , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[10]  John E. Stone,et al.  GPU clusters for high-performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[11]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[12]  Sébastien Limet,et al.  A scalable parallel minimum spanning tree algorithm for catchment basin delimitation in large digital elevation models , 2013, Concurr. Comput. Pract. Exp..

[13]  Sadaf R. Alam,et al.  Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures , 2013, Concurr. Comput. Pract. Exp..

[14]  Franck Cappello,et al.  HPCS 2013 panel: The era of exascale sciences: Challenges, needs and requirements , 2013, HPCS.

[15]  Jack Dongarra Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures , 2013 .

[16]  Desh Ranjan,et al.  High performance implementation of planted motif problem using suffix trees , 2011, 2011 International Conference on High Performance Computing & Simulation.

[17]  Orion Sky Lawlor Embedding OpenCL in C++ for Expressive GPU Programming , 2011 .

[18]  Andrzej Nowak,et al.  Evaluation of the Intel Sandy Bridge-EP server processor , 2012 .

[19]  P. Glaskowsky NVIDIA ’ s Fermi : The First Complete GPU Computing Architecture , 2009 .

[20]  Lex Wolters,et al.  Graphics processing unit optimizations for the dynamics of the HIRLAM weather forecast model , 2013, Concurr. Comput. Pract. Exp..

[21]  Constantinos Evangelinos,et al.  Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere- , 2008 .

[22]  James W. Demmel Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures , 2013 .

[23]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[24]  Francesco Iorio,et al.  Leveraging cloud computing and high performance computing advances for next-generation architecture, urban design and construction projects , 2011, SpringSim.

[25]  Desh Ranjan,et al.  High‐performance implementation of planted motif problem on multicore and GPU , 2013, Concurr. Comput. Pract. Exp..

[26]  Emmett Kilgariff,et al.  Fermi GF100 GPU Architecture , 2011, IEEE Micro.

[27]  Stephen P. Crago,et al.  Enabling Resilience through Introspection and Virtualization , 2012 .

[28]  J. Walters,et al.  Virtualized Cloud Computing for Exascale Performance , 2012 .

[29]  Theron Voran,et al.  Evaluating Intel ’ s Many Integrated Core Architecture for Climate Science , 2012 .

[30]  Eugenio Cesario,et al.  Programming knowledge discovery workflows in service‐oriented distributed systems , 2013, Concurr. Comput. Pract. Exp..

[31]  Parimala Thulasiraman,et al.  Designing APU Oriented Scientific Computing Applications in OpenCL , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[32]  William Gropp,et al.  Exascale Research: Preparing for the Post-Moore Era , 2011 .