Domain-specific language & support tools for high-level stream parallelism

Sistemas baseados em fluxo continuo de dados representam diversos dominios de aplicacoes, por exemplo, video, audio, processamento grafico e de rede, etc. Os programas que processam um fluxo continuo de dados podem executar em diferentes tipos de arquiteturas paralelas (estacoes de trabalho, servidores, celulares e supercomputadores) e representam cargas de trabalho significantes em nossos sistemas computacionais atuais. Mesmo assim, a maioria deles ainda nao e paralelizado. Alem disso, quando um novo software precisa ser desenvolvido, os programadores necessitam lidar com solucoes que oferecem pouca produtividade de codigo, portabilidade de codigo e desempenho. Para resolver este problema, estamos oferecendo uma nova linguagem especifica de dominio (DSL), que naturalmente captura e representa o paralelismo para aplicacoes baseadas em fluxo continuo de dados. O objetivo e oferecer um conjunto de atributos (atraves de anotacoes) que preservam o codigo fonte do programa e nao e dependente de arquitetura para anotar o paralelismo. Neste estudo foi usado o mecanismo de atributos do C++ para projetar uma DSL embarcada e padronizada com a linguagem hospedeira, que foi nomeada como SPar. No entanto, a implementacao de DSLs usando ferramentas baseadas em compiladores e dificil, complicado e geralmente requer uma curva de aprendizagem significativa. Isto e ainda mais dificil para aqueles que nao sao familiarizados com uma tecnologia de compiladores. Portanto, a motivacao e simplificar este caminho para outros pesquisadores (sabedores do seu dominio) com ferramentas de apoio (a ferramenta e chamada de CINCLE) para implementar DSLs produtivas e de alto nivel atraves de poderosas e agressivas transformacoes de fonte para fonte. Na verdade, desenvolvedores que criam programas com paralelismo podem usar suas habilidades sem ter que projetar e implementar o codigo de baixo nivel. O principal objetivo desta tese foi criar uma DSL e ferramentas de apoio para paralelismo de fluxo continuo de alto nivel no contexto de um framework de programacao que e baseado em compilador e orientado a dominio. Assim, SPar foi criado usando CINCLE. SPar oferece apoio ao desenvolvedor de software com produtividade, desempenho e portabilidade de codigo, enquanto CINCLE oferece o apoio necessario para gerar novas DSLs. Tambem, SPar mira transformacao de fonte para fonte produzindo codigo de padroes paralelos no topo de FastFlow e MPI. Por fim, temos um conjunto completo de experimentos demonstrando que SPar oferece melhor produtividade de codigo sem degradar significativamente o desempenho em sistemas multi-core bem como regras de transformacoes que sao capazes de atingir a portabilidade de codigo (para arquiteturas multi-computador) atraves dos seus atributos genericos.

[1]  Pietro Liò,et al.  NuChart-II: A Graph-Based Approach for Analysis and Interpretation of Hi-C Data , 2014, CIBB.

[2]  Luis Miguel Sánchez,et al.  Static partitioning and mapping of kernel-based applications over modern heterogeneous architectures , 2015, Simul. Model. Pract. Theory.

[3]  Dalvan Griebler,et al.  A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures , 2015, SEKE.

[4]  Peter Kilpatrick,et al.  Targeting Heterogeneous Architectures via Macro Data Flow , 2012, Parallel Process. Lett..

[5]  Gary R. Bradski,et al.  Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library , 2016 .

[6]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[7]  Steve Wright Digital Compositing for Film and Video , 2001 .

[8]  Luis Miguel Sánchez,et al.  Introducing Parallelism by Using REPARA C++11 Attributes , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[9]  Marco Danelutto,et al.  Structured Parallel Programming with "core" FastFlow , 2013, CEFP.

[10]  Marco Danelutto,et al.  Skeleton-based parallel programming: Functional and parallel semantics in a single shot , 2007, Comput. Lang. Syst. Struct..

[11]  Daniel Millot,et al.  STEP: A Distributed OpenMP for Coarse-Grain Parallelism Tool , 2008, IWOMP.

[12]  Claudia Misale,et al.  Accelerating Bowtie2 with a lock-less concurrency approach and memory affinity , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[13]  Peter Kilpatrick,et al.  Accelerating Code on Multi-cores with FastFlow , 2011, Euro-Par.

[14]  Pierre Jouvelot,et al.  Semantical interprocedural parallelization: an overview of the PIPS project , 1991 .

[15]  Joshua John Milthorpe,et al.  X10 for High-Performance Scientific Computing , 2015 .

[16]  Matteo Frigo,et al.  Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.

[17]  Marco Danelutto,et al.  Loop Parallelism: A New Skeleton Perspective on Data Parallel Patterns , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[18]  Scott A. Mahlke,et al.  Scaling Performance via Self-Tuning Approximation for Graphics Engines , 2014, TOCS.

[19]  Massimo Torquati,et al.  Efficient Smith-Waterman on Multi-core with FastFlow , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[20]  Marco Danelutto,et al.  FastFlow: High-level and Efficient Streaming on Multi-core , 2017 .

[21]  Eric Wong Optimizations in Stream Programming for Multimedia Applications by , 2012 .

[22]  Ralph E. Johnson,et al.  Expressing pipeline parallelism using TBB constructs: a case study on what works and what doesn't , 2011, SPLASH Workshops.

[23]  Thomas Rauber,et al.  Parallel Programming: for Multicore and Cluster Systems , 2010, Parallel Programming, 3rd Ed..

[24]  Rudolf Eigenmann,et al.  Experiences in Using Cetus for Source-to-Source Transformations , 2004, LCPC.

[25]  Rudolf Eigenmann,et al.  The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation , 2012, International Journal of Parallel Programming.

[26]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[27]  Dalvan Griebler,et al.  Towards a Domain-Specific Language for geospatial data visualization maps with Big Data sets , 2015, 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA).

[28]  Debasish Ghosh,et al.  DSLs in Action , 2010 .

[29]  Eduard Ayguadé,et al.  Self-Adaptive OmpSs Tasks in Heterogeneous Environments , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[30]  Antoniu Pop,et al.  Leveraging streaming for deterministic parallelization : an integrated language, compiler and runtime approach , 2011 .

[31]  Sergio Aldea López,et al.  Compile-time support for thread-level speculation , 2014 .

[32]  Bruno Cardoso Lopes,et al.  Getting Started with LLVM Core Libraries , 2014 .

[33]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[34]  Horacio González-Vélez,et al.  A survey of algorithmic skeleton frameworks: high‐level structured parallel programming enablers , 2010, Softw. Pract. Exp..

[35]  Serge Guelton,et al.  Building Source-to-Source Compilers for Heterogeneous Targets , 2012 .

[36]  Robert Grimm,et al.  Dynamic expressivity with static optimization for streaming languages , 2013, DEBS '13.

[37]  Alejandro Duran,et al.  Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[38]  Dalvan Griebler Proposta de uma linguagem específica de domínio de programação paralela orientada a padrões paralelos: um estudo de caso baseado no padrão mestre/escravo para arquiteturas multi-core , 2012 .

[39]  Alejandro Duran,et al.  Productive Cluster Programming with OmpSs , 2011, Euro-Par.

[40]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[41]  Kunle Olukotun,et al.  A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.

[42]  Dalvan Griebler,et al.  Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming , 2013, SBLP.

[43]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[44]  Charles E. Leiserson,et al.  The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[45]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[46]  Charles E. Leiserson,et al.  On-the-Fly Pipeline Parallelism , 2015, ACM Trans. Parallel Comput..

[47]  Michael I. Gordon Compiler techniques for scalable performance of stream programs on multicore architectures , 2010 .

[48]  Marco Danelutto,et al.  Parallel Patterns for General Purpose Many-Core , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[49]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[50]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[51]  Torquati Massimo,et al.  Pool evolution: a domain specific parallel pattern , 2014 .

[52]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[53]  William Thies,et al.  An empirical characterization of stream programs and its implications for language and compiler design , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[54]  Aruna Raja,et al.  Domain Specific Languages , 2010 .

[55]  Dalvan Griebler,et al.  Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures , 2014, SEKE.

[56]  Abhishek Gupta,et al.  Parallel Programming with Migratable Objects: Charm++ in Practice , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[57]  Rudolf Eigenmann,et al.  Programming Distributed Memory Sytems Using OpenMP , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[58]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[59]  Peter Kilpatrick,et al.  Parallel Patterns + Macro Data Flow for Multi-core Programming , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[60]  Ellis Horowitz,et al.  Software Cost Estimation with COCOMO II , 2000 .

[61]  Björn Karlsson,et al.  Beyond the C++ Standard Library: An Introduction to Boost , 2005 .

[62]  Quinton Anderson Storm Real-Time Processing Cookbook , 2013 .

[63]  Justin Talbot,et al.  Phoenix++: modular MapReduce for shared-memory systems , 2011, MapReduce '11.

[64]  Kunle Olukotun,et al.  A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[65]  Markus Schordan,et al.  A Source-to-Source Architecture for User-Defined Optimizations , 2003, JMLC.

[66]  Marco Danelutto,et al.  An Embedded C++ Domain-Specific Language for Stream Parallelism , 2015, PARCO.

[67]  Jonathan Schaeffer,et al.  An experiment to measure the usability of parallel programming systems , 1996, Concurr. Pract. Exp..

[68]  Massimo Torquati,et al.  Parallel Stochastic Simulators in System Biology: The Evolution of the Species , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[69]  Albert Cohen,et al.  OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs , 2012, TACO.

[70]  David Zhang,et al.  A lightweight streaming layer for multicore execution , 2008, CARN.

[71]  Massimo Torquati,et al.  Exercising High-Level Parallel Programming on Streams: A Systems Biology Use Case , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[72]  Patrick Wendell,et al.  Learning Spark: Lightning-Fast Big Data Analytics , 2015 .

[73]  Anne Benoit,et al.  Two Fundamental Concepts in Skeletal Parallel Programming , 2005, International Conference on Computational Science.

[74]  Kunle Olukotun,et al.  OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning , 2011, ICML.

[75]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[76]  Leonie Kohl,et al.  Parallel Programming In C With Mpi And Open Mp , 2016 .

[77]  Eric Darve,et al.  Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[78]  William Thies,et al.  Language and compiler support for stream programs , 2009 .

[79]  Albert Cohen,et al.  A stream-computing extension to OpenMP , 2011, HiPEAC.

[80]  Kunle Olukotun,et al.  Delite , 2014, ACM Trans. Embed. Comput. Syst..

[81]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[82]  Bradford Nichols,et al.  Pthreads programming , 1996 .

[83]  Peter Kilpatrick,et al.  Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed , 2009, PARCO.

[84]  De Giusti,et al.  Structured Parallel Programming: patterns for efficient computation , 2015 .

[85]  Eelco Visser,et al.  DSL Engineering - Designing, Implementing and Using Domain-Specific Languages , 2013 .

[86]  Markus Schordan,et al.  Classification and Utilization of Abstractions for Optimization , 2004, ISoLA.

[87]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[88]  Peter Kilpatrick,et al.  An Efficient Unbounded Lock-Free Queue for Multi-core Systems , 2012, Euro-Par.

[89]  Peter Kilpatrick,et al.  A Green Perspective on Structured Parallel Programming , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[90]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[91]  Concetto Spampinato,et al.  Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern , 2015, Int. J. High Perform. Comput. Appl..

[92]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[93]  Sandeep Koranne,et al.  Boost C++ Libraries , 2011 .

[94]  Kurt Wall,et al.  The Definitive Guide to GCC , 2004, Apress.

[95]  Massimo Torquati,et al.  Message Passing on InfiniBand RDMA for Parallel Run-Time Supports , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[96]  Donald. Miner,et al.  MapReduce design patterns , 2012 .

[97]  Kunle Olukotun,et al.  Simplifying Scalable Graph Processing with a Domain-Specific Language , 2014, CGO '14.

[98]  Mehdi Amini,et al.  Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators. (Transformations de programme automatiques et source-à-source pour accélérateurs matériels de type GPU) , 2012 .

[99]  Rudolf Eigenmann,et al.  Towards automatic translation of OpenMP to MPI , 2005, ICS '05.

[100]  James Reinders,et al.  High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches , 2014 .

[101]  Cleverson Lopes Ledur,et al.  Gmavis: a domain-specific language for large-scale geospatial data visualization supporting multi-core parallelism , 2016 .

[102]  Brendan Gregg,et al.  Systems Performance: Enterprise and the Cloud , 2013 .

[103]  Dalvan Griebler,et al.  Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures , 2015, Int. J. Softw. Eng. Knowl. Eng..

[104]  Martin Odersky,et al.  Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.

[105]  Horacio González-Vélez,et al.  Automated Instantiation of Heterogeneous Fast Flow CPU/GPU Parallel Pattern Applications in Clouds , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[106]  Rudolf Eigenmann,et al.  Cetus: A Source-to-Source Compiler Infrastructure for Multicores , 2009, Computer.

[107]  Bjarne Stroustrup,et al.  Programming: Principles and Practice Using C++ , 2008 .

[108]  Marco Danelutto,et al.  Deep Packet Inspection on Commodity Hardware using FastFlow , 2013, PARCO.