Bringing Parallel Patterns Out of the Corner

High-level parallel programming is an active research topic aimed at promoting parallel programming methodologies that provide the programmer with high-level abstractions to develop complex parallel software with reduced time to solution. Pattern-based parallel programming is based on a set of composable and customizable parallel patterns used as basic building blocks in parallel applications. In recent years, a considerable effort has been made in empowering this programming model with features able to overcome shortcomings of early approaches concerning flexibility and performance. In this article, we demonstrate that the approach is flexible and efficient enough by applying it on 12 out of 13 PARSEC applications. Our analysis, conducted on three different multicore architectures, demonstrates that pattern-based parallel programming has reached a good level of maturity, providing comparable results in terms of performance with respect to both other parallel programming methodologies based on pragma-based annotations (i.e., Openmp and OmpSs) and native implementations (i.e., Pthreads). Regarding the programming effort, we also demonstrate a considerable reduction in lines of code and code churn compared to Pthreads and comparable results with respect to other existing implementations.

[1]  Christoph W. Kessler,et al.  SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[2]  Marco Danelutto,et al.  P3ARSEC: towards parallel patterns benchmarking , 2017, SAC.

[3]  Marco Danelutto,et al.  A LIGHTWEIGHT RUN-TIME SUPPORT FOR FAST DENSE LINEAR ALGEBRA ON MULTI-CORE , 2014 .

[4]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[5]  Craig Chambers,et al.  FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[6]  Kunle Olukotun,et al.  A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[7]  Marco Danelutto,et al.  Mammut: High-level management of system knobs and sensors , 2017, SoftwareX.

[8]  Marco Danelutto,et al.  A Reconfiguration Algorithm for Power-Aware Parallel Applications , 2016, ACM Trans. Archit. Code Optim..

[9]  Marco Danelutto,et al.  SPar: A DSL for High-Level and Productive Stream Parallelism , 2017, Parallel Process. Lett..

[10]  Tiziano De Matteis,et al.  Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing , 2016, PPoPP.

[11]  Charles E. Leiserson,et al.  On-the-Fly Pipeline Parallelism , 2015, ACM Trans. Parallel Comput..

[12]  Giuseppe Castagna ECOOP 2013 – Object-Oriented Programming , 2013, Lecture Notes in Computer Science.

[13]  Marco Danelutto,et al.  RPL: A Domain-Specific Language for Designing and Implementing Parallel C++ Applications , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[14]  Marco Danelutto,et al.  Deep Packet Inspection on Commodity Hardware using FastFlow , 2013, PARCO.

[15]  Massimo Torquati,et al.  On Designing Multicore-Aware Simulators for Biological Systems , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[16]  Frédéric Loulergue,et al.  OSL: An Algorithmic Skeleton Library with Exceptions , 2013, ICCS.

[17]  Magnus Jahre,et al.  ParVec: vectorizing the PARSEC benchmark suite , 2015, Computing.

[18]  Moustafa Ghanem,et al.  Structured parallel programming , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.

[19]  Colin Campbell,et al.  Parallel Programming with Microsoft® .NET: Design Patterns for Decomposition and Coordination on Multicore Architectures , 2010 .

[20]  Eduard Ayguadé,et al.  PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite , 2016, ACM Trans. Archit. Code Optim..

[21]  Horacio González-Vélez,et al.  A survey of algorithmic skeleton frameworks: high‐level structured parallel programming enablers , 2010, Softw. Pract. Exp..

[22]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[23]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[24]  P. K. Dubey,et al.  Recognition, Mining and Synthesis Moves Comp uters to the Era of Tera , 2005 .

[25]  Barbara M. Chapman The Multicore Programming Challenge , 2007, APPT.

[26]  Sebastian G. Elbaum,et al.  Code churn: a measure for estimating the impact of code change , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[27]  Denis Caromel,et al.  Type Safe Algorithmic Skeletons , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[28]  Arch D. Robison,et al.  Structured Parallel Programming: Patterns for Efficient Computation , 2012 .

[29]  Gurindar S. Sohi,et al.  Adaptive, efficient, parallel execution of parallel programs , 2014, PLDI.

[30]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[31]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[32]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[33]  Christoph W. Kessler,et al.  SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems , 2018, International Journal of Parallel Programming.

[34]  Zhe Wang,et al.  Ferret: a toolkit for content-based similarity search of feature-rich data , 2006, EuroSys.

[35]  Bugra Gedik,et al.  Pipelined fission for stream programs with dynamic selectivity and partitioned state , 2016, J. Parallel Distributed Comput..

[36]  Marco Vanneschi,et al.  The programming model of ASSIST, an environment for parallel and distributed portable applications , 2002, Parallel Comput..

[37]  Mario Leyton,et al.  Skandium: Multi-core Programming with Algorithmic Skeletons , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[38]  Kiminori Matsuzaki,et al.  An Automatic Fusion Mechanism for Variable-Length List Skeletons in SkeTo , 2013, International Journal of Parallel Programming.

[39]  Rafael Asenjo,et al.  Evaluation of the Task Programming Model in the Parallelization of Wavefront Problems , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[40]  Herbert Kuchen,et al.  Algorithmic skeletons for multi-core, multi-GPU systems and clusters , 2012, Int. J. High Perform. Comput. Netw..

[41]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[42]  Peter Kilpatrick,et al.  An Efficient Unbounded Lock-Free Queue for Multi-core Systems , 2012, Euro-Par.

[43]  José Daniel García Sánchez,et al.  A generic parallel pattern interface for stream and data processing , 2017, Concurr. Comput. Pract. Exp..

[44]  Marco Danelutto,et al.  Parallel Continuous Preference Queries over Out-of-Order and Bursty Data Streams , 2017, IEEE Transactions on Parallel and Distributed Systems.

[45]  F. Black,et al.  The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[46]  Massimo Torquati,et al.  Decision tree building on multi‐core using FastFlow , 2014, Concurr. Comput. Pract. Exp..

[47]  Rafael Asenjo,et al.  Analytical Modeling of Pipeline Parallelism , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[48]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[49]  Ivan Merelli,et al.  PWHATSHAP: efficient haplotyping for future generation sequencing , 2016, BMC Bioinformatics.

[50]  Luis Miguel Sánchez,et al.  Introducing Parallelism by Using REPARA C++11 Attributes , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[51]  Marco Danelutto,et al.  Structured Parallel Programming with "core" FastFlow , 2013, CEFP.

[52]  Sergei Gorlatch,et al.  SkelCL - A Portable Skeleton Library for High-Level GPU Programming , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[53]  D. Heath,et al.  Bond Pricing and the Term Structure of Interest Rates: A Discrete Time Approximation , 1990, Journal of Financial and Quantitative Analysis.

[54]  Kunle Olukotun,et al.  Composition and Reuse with Compiled Domain-Specific Languages , 2013, ECOOP.

[55]  Elaine J. Weyuker,et al.  Evaluating Software Complexity Measures , 2010, IEEE Trans. Software Eng..

[56]  Jan Reineke,et al.  Ascertaining Uncertainty for Efficient Exact Cache Analysis , 2017, CAV.

[57]  Clemens Grelck,et al.  Shared memory multiprocessor support for functional array processing in SAC , 2005, J. Funct. Program..

[58]  Sergei Gorlatch,et al.  High-Level Programming for Many-Cores Using C++14 and the STL , 2018, International Journal of Parallel Programming.

[59]  Kirk Martinez,et al.  VIPS - a highly tuned image processing software architecture , 2005, IEEE International Conference on Image Processing 2005.

[60]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[61]  Salvatore Orlando,et al.  P3 L: A structured high-level parallel language, and its structured support , 1995, Concurr. Pract. Exp..