High‐level parallel programming in a heterogeneous world

During the last decade, parallel programming has evolved in an unprecedent way. Fifteen years ago, the future of parallel computing seemed to consist on the advent of multicore processors composed by an ever-increasing number in the core count per CPU, and their interconnection to form larger clusters. Programming models, such as OpenMP1 that allows to transform sequential C and Fortran codes into parallel versions with low effort, without requiring to explicitly handle threads or to share memory among them, seemed to be the winner choice in a world where computers would include more and more cores inside a single chip. Message-passing paradigms, such as MPI,2 provided a solution to interconnect these computers in larger facilities. Suddenly, the use of dedicated hardware, originally designed to render graphics, to perform general computations, changed the rules of the game. The advent of GPUs as general computing device and their associated programming model, CUDA,3 steadily colonized the TOP500 supercomputing list.4 The success of this computing model, composed of very simple computational units that are orchestrated to perform the same operations on an increasing number of data elements, made the landscape of supercomputing much more interesting. Since GPU programming is not precisely an easy task, the last ten years saw the rise of new programming models for heterogeneous systems, which aim to offer a unified view of the different computational units. These proposals intend to reduce the complexity of parallel programming, while keeping the performance obtained by their manually programmed counterparts. To succeed, these proposals should pass the test of time, being extensible enough to include new computing devices that still do not exist. New computing architectures, such as the family of XeonPhi coprocessors,5 pose new challenges to the designers of these high-level parallel programming models. To raise the level of abstraction, parallel programming patterns have emerged as a way of expressing parallelism in existing sequential applications. Many algorithms match with parallel patterns, being easily exploitable by heterogeneous parallel architectures.6 Algorithmic skeletons are commonly used to express parallel patterns since the 90s.7 They abstract the application functionalities from the implementation details, allowing a common representation of algorithms in multiple and diverging parallel platforms. In this context is where the ‘‘High-Level Parallel Programming’’ conference takes place. Starting in 2001, HLPP has gathered specialists from all around the globe to discuss recent advances in the topic of how to develop better software frameworks that cope with this increasing level of complexity. The particular format of the HLPP conference, with informal proceedings distributed among its attendants, allows the presentation of cutting-edge, ongoing work that receive direct feedback from the audience, thus enriching the research carried out. In an era of big conferences that attract several hundreds of researchers that should compete for the attention of the audience, scattered among different tracks that take place in parallel, the HLPP series of workshop/symposia keeps the spirit of specialist meetings, where all the speakers are equally accessible and where interesting discussions are carried out after each presentation. This special issue of the International Journal of Parallel Programming contains revised papers, selected from those presented at The 10th International Symposium on High-Level Parallel Programming and Applications (HLPP 2017), held in Valladolid, Spain, July 10-11th, 2017. The program committee of HLPP 2017 accepted 14 papers out of 20 full paper submissions, covering both foundational and practical issues in high-level parallel programming and applications. Authors of several selected papers were invited to submit extended versions to this special issue. Nine papers went through the peer review process of Concurrency and Computation: Practice and Experience. Finally, seven papers were selected to be published in this special issue.

[1]  Arturo González-Escribano,et al.  Automatic runtime calculation of communications for data‐parallel expressions with periodic conditions , 2019, Concurr. Comput. Pract. Exp..

[2]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[3]  Sathya Peri,et al.  STMs in practice: Partial rollback vs pure abort mechanisms , 2019, Concurr. Comput. Pract. Exp..

[4]  Murray Cole,et al.  Parallel Skeletons , 2011, Encyclopedia of Parallel Computing.

[5]  Felix Wolf,et al.  Dissecting sequential programs for parallelization—An approach based on computational units , 2018, Concurr. Comput. Pract. Exp..

[6]  Gabriele Mencagli,et al.  Power‐aware pipelining with automatic concurrency control , 2019, Concurr. Comput. Pract. Exp..

[7]  Christoph W. Kessler,et al.  Extending smart containers for data locality‐aware skeleton programming , 2019, Concurr. Comput. Pract. Exp..

[8]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[9]  De Giusti,et al.  Structured Parallel Programming: patterns for efficient computation , 2015 .

[10]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[11]  Ricardo Rocha,et al.  Multi‐dimensional lock‐free arrays for multithreaded mode‐directed tabling in Prolog , 2019, Concurr. Comput. Pract. Exp..

[12]  Sergei Gorlatch,et al.  ATF: A generic directive‐based auto‐tuning framework , 2019, Concurr. Comput. Pract. Exp..

[13]  Christina Freytag,et al.  Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .