Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures
暂无分享,去创建一个
Torsten Hoefler | Timo Schneider | Johannes de Fine Licht | Tal Ben-Nun | Alexandros Nikolaos Ziogas
[1] Guido van Rossum,et al. Python Programming Language , 2007, USENIX Annual Technical Conference.
[2] Shoaib Kamil,et al. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code , 2018, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[3] Keshav Pingali,et al. A lightweight infrastructure for graph analytics , 2013, SOSP.
[4] Nancy M. Amato,et al. STAPL: An Adaptive, Generic Parallel C++ Library , 2001, LCPC.
[5] Vivek Sarkar,et al. Polyhedral Optimizations for a Data-Flow Graph Language , 2015, LCPC.
[6] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1987, TOPL.
[7] Jin Zhou,et al. Bamboo: a data-centric, object-oriented approach to many-core software , 2010, PLDI '10.
[8] Hartmut Kaiser,et al. HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.
[9] Shoaib Kamil,et al. Tiramisu: A Code Optimization Framework for High Performance Systems , 2018 .
[10] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] Michael Garland,et al. Legate NumPy: accelerated and distributed array computing , 2019, SC.
[12] Marco D. Santambrogio,et al. A Unified Backend for Targeting FPGAs from DSLs , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[13] Kunle Olukotun,et al. Spatial: a language and compiler for application accelerators , 2018, PLDI.
[14] Torsten Hoefler,et al. A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations , 2019, SC.
[15] Sarita V. Adve,et al. HPVM: heterogeneous parallel virtual machine , 2018, PPoPP.
[16] Mohamed Wahib,et al. Scalable Kernel Fusion for Memory-Bound GPU Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[18] Torsten Hoefler,et al. Transformations of High-Level Synthesis Codes for High-Performance Computing , 2018, IEEE Transactions on Parallel and Distributed Systems.
[19] John Shalf,et al. Trends in Data Locality Abstractions for HPC Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.
[20] Amnon Barak,et al. Memory access patterns: the missing piece of the multi-GPU puzzle , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[21] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[22] Krishna P. Gummadi,et al. Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.
[23] M. Abadi,et al. Naiad: a timely dataflow system , 2013, SOSP.
[24] Roberto Bruni,et al. Operational Semantics of IMP , 2017 .
[25] Hartmut Ehrig,et al. Fundamentals of Algebraic Graph Transformation (Monographs in Theoretical Computer Science. An EATCS Series) , 1992 .
[26] Alexander Aiken,et al. Regent: a high-productivity programming language for HPC with logical regions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] Georgi Gaydadjiev,et al. Spatial Programming with OpenSPL , 2016, FPGAs for Software Programmers.
[28] Sam Lindley,et al. Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code , 2015, ICFP.
[29] Michael Löwe,et al. Algebraic Approach to Single-Pushout Graph Transformation , 1993, Theor. Comput. Sci..
[30] Torsten Hoefler,et al. To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations , 2017, HPDC.
[31] P. Hanrahan,et al. Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[32] Vivek Sarkar,et al. PIPES: A Language and Compiler for Task-Based Programming on Distributed-Memory Clusters , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[33] W. Fichtner,et al. Atomistic simulation of nanowires in the sp3d5s* tight-binding formalism: From boundary conditions to strain calculations , 2006 .
[34] Daniel Sunderland,et al. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..
[35] Jens Palsberg,et al. Concurrent Collections , 2010 .
[36] Franz Franchetti,et al. From High-Level Specification to High-Performance Code , 2018, Proc. IEEE.
[37] Alex Brooks,et al. Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics , 2018, PLDI.
[38] Jungwon Kim,et al. OpenACC to FPGA: A Framework for Directive-Based High-Performance Reconfigurable Computing , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[39] Mario Vento,et al. A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[40] John Freeman,et al. From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[41] Elnar Hajiyev,et al. PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[42] Bradley N. Miller,et al. The Python Programming Language , 2006 .
[43] Uday Bondhugula,et al. PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.
[44] John D. Leidel,et al. Extreme Heterogeneity 2018 - Productive Computational Science in the Era of Extreme Heterogeneity: Report for DOE ASCR Workshop on Extreme Heterogeneity , 2018 .
[45] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[46] Tze Meng Low,et al. SPIRAL: Extreme Performance Portability , 2018, Proceedings of the IEEE.
[47] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[48] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[49] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.
[50] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[51] Mary W. Hall,et al. CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .
[52] Christian Lengauer,et al. Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..
[53] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[54] Eduard Ayguadé,et al. Supporting stateful tasks in a dataflow graph , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[55] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[56] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[57] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[58] David A. Bader,et al. Graph Partitioning and Graph Clustering, 10th DIMACS Implementation Challenge Workshop, Georgia Institute of Technology, Atlanta, GA, USA, February 13-14, 2012. Proceedings , 2013, Graph Partitioning and Graph Clustering.
[59] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[60] Chun Chen,et al. A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.