论文信息 - Domain-specific language & support tools for high-level stream parallelism

Domain-specific language & support tools for high-level stream parallelism

Sistemas baseados em fluxo continuo de dados representam diversos dominios de aplicacoes, por exemplo, video, audio, processamento grafico e de rede, etc. Os programas que processam um fluxo continuo de dados podem executar em diferentes tipos de arquiteturas paralelas (estacoes de trabalho, servidores, celulares e supercomputadores) e representam cargas de trabalho significantes em nossos sistemas computacionais atuais. Mesmo assim, a maioria deles ainda nao e paralelizado. Alem disso, quando um novo software precisa ser desenvolvido, os programadores necessitam lidar com solucoes que oferecem pouca produtividade de codigo, portabilidade de codigo e desempenho. Para resolver este problema, estamos oferecendo uma nova linguagem especifica de dominio (DSL), que naturalmente captura e representa o paralelismo para aplicacoes baseadas em fluxo continuo de dados. O objetivo e oferecer um conjunto de atributos (atraves de anotacoes) que preservam o codigo fonte do programa e nao e dependente de arquitetura para anotar o paralelismo. Neste estudo foi usado o mecanismo de atributos do C++ para projetar uma DSL embarcada e padronizada com a linguagem hospedeira, que foi nomeada como SPar. No entanto, a implementacao de DSLs usando ferramentas baseadas em compiladores e dificil, complicado e geralmente requer uma curva de aprendizagem significativa. Isto e ainda mais dificil para aqueles que nao sao familiarizados com uma tecnologia de compiladores. Portanto, a motivacao e simplificar este caminho para outros pesquisadores (sabedores do seu dominio) com ferramentas de apoio (a ferramenta e chamada de CINCLE) para implementar DSLs produtivas e de alto nivel atraves de poderosas e agressivas transformacoes de fonte para fonte. Na verdade, desenvolvedores que criam programas com paralelismo podem usar suas habilidades sem ter que projetar e implementar o codigo de baixo nivel. O principal objetivo desta tese foi criar uma DSL e ferramentas de apoio para paralelismo de fluxo continuo de alto nivel no contexto de um framework de programacao que e baseado em compilador e orientado a dominio. Assim, SPar foi criado usando CINCLE. SPar oferece apoio ao desenvolvedor de software com produtividade, desempenho e portabilidade de codigo, enquanto CINCLE oferece o apoio necessario para gerar novas DSLs. Tambem, SPar mira transformacao de fonte para fonte produzindo codigo de padroes paralelos no topo de FastFlow e MPI. Por fim, temos um conjunto completo de experimentos demonstrando que SPar oferece melhor produtividade de codigo sem degradar significativamente o desempenho em sistemas multi-core bem como regras de transformacoes que sao capazes de atingir a portabilidade de codigo (para arquiteturas multi-computador) atraves dos seus atributos genericos.

Dalvan Griebler | Dalvan Griebler

[1] Pietro Liò,et al. NuChart-II: A Graph-Based Approach for Analysis and Interpretation of Hi-C Data , 2014, CIBB.

[2] Luis Miguel Sánchez,et al. Static partitioning and mapping of kernel-based applications over modern heterogeneous architectures , 2015, Simul. Model. Pract. Theory.

[3] Dalvan Griebler,et al. A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures , 2015, SEKE.

[4] Peter Kilpatrick,et al. Targeting Heterogeneous Architectures via Macro Data Flow , 2012, Parallel Process. Lett..

[5] Gary R. Bradski,et al. Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library , 2016 .

[6] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[7] Steve Wright. Digital Compositing for Film and Video , 2001 .

[8] Luis Miguel Sánchez,et al. Introducing Parallelism by Using REPARA C++11 Attributes , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[9] Marco Danelutto,et al. Structured Parallel Programming with "core" FastFlow , 2013, CEFP.

[10] Marco Danelutto,et al. Skeleton-based parallel programming: Functional and parallel semantics in a single shot , 2007, Comput. Lang. Syst. Struct..

[11] Daniel Millot,et al. STEP: A Distributed OpenMP for Coarse-Grain Parallelism Tool , 2008, IWOMP.

[12] Claudia Misale,et al. Accelerating Bowtie2 with a lock-less concurrency approach and memory affinity , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[13] Peter Kilpatrick,et al. Accelerating Code on Multi-cores with FastFlow , 2011, Euro-Par.

[14] Pierre Jouvelot,et al. Semantical interprocedural parallelization: an overview of the PIPS project , 1991 .

[15] Joshua John Milthorpe,et al. X10 for High-Performance Scientific Computing , 2015 .

[16] Matteo Frigo,et al. Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.

[17] Marco Danelutto,et al. Loop Parallelism: A New Skeleton Perspective on Data Parallel Patterns , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[18] Scott A. Mahlke,et al. Scaling Performance via Self-Tuning Approximation for Graphics Engines , 2014, TOCS.

[19] Massimo Torquati,et al. Efficient Smith-Waterman on Multi-core with FastFlow , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[20] Marco Danelutto,et al. FastFlow: High-level and Efficient Streaming on Multi-core , 2017 .

[21] Eric Wong. Optimizations in Stream Programming for Multimedia Applications by , 2012 .

[22] Ralph E. Johnson,et al. Expressing pipeline parallelism using TBB constructs: a case study on what works and what doesn't , 2011, SPLASH Workshops.

[23] Thomas Rauber,et al. Parallel Programming: for Multicore and Cluster Systems , 2010, Parallel Programming, 3rd Ed..

[24] Rudolf Eigenmann,et al. Experiences in Using Cetus for Source-to-Source Transformations , 2004, LCPC.

[25] Rudolf Eigenmann,et al. The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation , 2012, International Journal of Parallel Programming.

[26] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[27] Dalvan Griebler,et al. Towards a Domain-Specific Language for geospatial data visualization maps with Big Data sets , 2015, 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA).

[28] Debasish Ghosh,et al. DSLs in Action , 2010 .

[29] Eduard Ayguadé,et al. Self-Adaptive OmpSs Tasks in Heterogeneous Environments , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[30] Antoniu Pop,et al. Leveraging streaming for deterministic parallelization : an integrated language, compiler and runtime approach , 2011 .

[31] Sergio Aldea López,et al. Compile-time support for thread-level speculation , 2014 .

[32] Bruno Cardoso Lopes,et al. Getting Started with LLVM Core Libraries , 2014 .

[33] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .

[34] Horacio González-Vélez,et al. A survey of algorithmic skeleton frameworks: high‐level structured parallel programming enablers , 2010, Softw. Pract. Exp..

[35] Serge Guelton,et al. Building Source-to-Source Compilers for Heterogeneous Targets , 2012 .

[36] Robert Grimm,et al. Dynamic expressivity with static optimization for streaming languages , 2013, DEBS '13.

[37] Alejandro Duran,et al. Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[38] Dalvan Griebler. Proposta de uma linguagem específica de domínio de programação paralela orientada a padrões paralelos: um estudo de caso baseado no padrão mestre/escravo para arquiteturas multi-core , 2012 .

[39] Alejandro Duran,et al. Productive Cluster Programming with OmpSs , 2011, Euro-Par.

[40] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.

[41] Kunle Olukotun,et al. A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.

[42] Dalvan Griebler,et al. Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming , 2013, SBLP.

[43] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[44] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[45] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[46] Charles E. Leiserson,et al. On-the-Fly Pipeline Parallelism , 2015, ACM Trans. Parallel Comput..

[47] Michael I. Gordon. Compiler techniques for scalable performance of stream programs on multicore architectures , 2010 .

[48] Marco Danelutto,et al. Parallel Patterns for General Purpose Many-Core , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[49] Barbara Chapman,et al. Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[50] James Reinders,et al. Intel® threading building blocks , 2008 .

[51] Torquati Massimo,et al. Pool evolution: a domain specific parallel pattern , 2014 .

[52] Murray Cole,et al. Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[53] William Thies,et al. An empirical characterization of stream programs and its implications for language and compiler design , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[54] Aruna Raja,et al. Domain Specific Languages , 2010 .

[55] Dalvan Griebler,et al. Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures , 2014, SEKE.

[56] Abhishek Gupta,et al. Parallel Programming with Migratable Objects: Charm++ in Practice , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[57] Rudolf Eigenmann,et al. Programming Distributed Memory Sytems Using OpenMP , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[58] Timothy G. Mattson,et al. Patterns for parallel programming , 2004 .

[59] Peter Kilpatrick,et al. Parallel Patterns + Macro Data Flow for Multi-core Programming , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[60] Ellis Horowitz,et al. Software Cost Estimation with COCOMO II , 2000 .

[61] Björn Karlsson,et al. Beyond the C++ Standard Library: An Introduction to Boost , 2005 .

[62] Quinton Anderson. Storm Real-Time Processing Cookbook , 2013 .

[63] Justin Talbot,et al. Phoenix++: modular MapReduce for shared-memory systems , 2011, MapReduce '11.

[64] Kunle Olukotun,et al. A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[65] Markus Schordan,et al. A Source-to-Source Architecture for User-Defined Optimizations , 2003, JMLC.

[66] Marco Danelutto,et al. An Embedded C++ Domain-Specific Language for Stream Parallelism , 2015, PARCO.

[67] Jonathan Schaeffer,et al. An experiment to measure the usability of parallel programming systems , 1996, Concurr. Pract. Exp..

[68] Massimo Torquati,et al. Parallel Stochastic Simulators in System Biology: The Evolution of the Species , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[69] Albert Cohen,et al. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs , 2012, TACO.

[70] David Zhang,et al. A lightweight streaming layer for multicore execution , 2008, CARN.

[71] Massimo Torquati,et al. Exercising High-Level Parallel Programming on Streams: A Systems Biology Use Case , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[72] Patrick Wendell,et al. Learning Spark: Lightning-Fast Big Data Analytics , 2015 .

[73] Anne Benoit,et al. Two Fundamental Concepts in Skeletal Parallel Programming , 2005, International Conference on Computational Science.

[74] Kunle Olukotun,et al. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning , 2011, ICML.

[75] Murray Cole,et al. Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[76] Leonie Kohl,et al. Parallel Programming In C With Mpi And Open Mp , 2016 .

[77] Eric Darve,et al. Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[78] William Thies,et al. Language and compiler support for stream programs , 2009 .

[79] Albert Cohen,et al. A stream-computing extension to OpenMP , 2011, HiPEAC.

[80] Kunle Olukotun,et al. Delite , 2014, ACM Trans. Embed. Comput. Syst..

[81] Joseph M. Hellerstein,et al. MapReduce Online , 2010, NSDI.

[82] Bradford Nichols,et al. Pthreads programming , 1996 .

[83] Peter Kilpatrick,et al. Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed , 2009, PARCO.

[84] De Giusti,et al. Structured Parallel Programming: patterns for efficient computation , 2015 .

[85] Eelco Visser,et al. DSL Engineering - Designing, Implementing and Using Domain-Specific Languages , 2013 .

[86] Markus Schordan,et al. Classification and Utilization of Abstractions for Optimization , 2004, ISoLA.

[87] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[88] Peter Kilpatrick,et al. An Efficient Unbounded Lock-Free Queue for Multi-core Systems , 2012, Euro-Par.

[89] Peter Kilpatrick,et al. A Green Perspective on Structured Parallel Programming , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[90] Corinne Ancourt,et al. Scanning polyhedra with DO loops , 1991, PPOPP '91.

[91] Concetto Spampinato,et al. Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern , 2015, Int. J. High Perform. Comput. Appl..

[92] Ralph Johnson,et al. design patterns elements of reusable object oriented software , 2019 .

[93] Sandeep Koranne,et al. Boost C++ Libraries , 2011 .

[94] Kurt Wall,et al. The Definitive Guide to GCC , 2004, Apress.

[95] Massimo Torquati,et al. Message Passing on InfiniBand RDMA for Parallel Run-Time Supports , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[96] Donald. Miner,et al. MapReduce design patterns , 2012 .

[97] Kunle Olukotun,et al. Simplifying Scalable Graph Processing with a Domain-Specific Language , 2014, CGO '14.

[98] Mehdi Amini,et al. Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators. (Transformations de programme automatiques et source-à-source pour accélérateurs matériels de type GPU) , 2012 .

[99] Rudolf Eigenmann,et al. Towards automatic translation of OpenMP to MPI , 2005, ICS '05.

[100] James Reinders,et al. High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches , 2014 .

[101] Cleverson Lopes Ledur,et al. Gmavis: a domain-specific language for large-scale geospatial data visualization supporting multi-core parallelism , 2016 .

[102] Brendan Gregg,et al. Systems Performance: Enterprise and the Cloud , 2013 .

[103] Dalvan Griebler,et al. Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures , 2015, Int. J. Softw. Eng. Knowl. Eng..

[104] Martin Odersky,et al. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.

[105] Horacio González-Vélez,et al. Automated Instantiation of Heterogeneous Fast Flow CPU/GPU Parallel Pattern Applications in Clouds , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[106] Rudolf Eigenmann,et al. Cetus: A Source-to-Source Compiler Infrastructure for Multicores , 2009, Computer.

[107] Bjarne Stroustrup,et al. Programming: Principles and Practice Using C++ , 2008 .

[108] Marco Danelutto,et al. Deep Packet Inspection on Commodity Hardware using FastFlow , 2013, PARCO.