Easy Dataflow Programming in Clusters with UPC++ DepSpawn
暂无分享,去创建一个
[1] J. A. Francis,et al. Titanium , 2019, Materials Science and Technology.
[2] Jason Duell,et al. Productivity and performance using partitioned global address space languages , 2007, PASCO '07.
[3] Eduard Ayguadé,et al. DaSH: A benchmark suite for hybrid dataflow and shared memory programming models , 2015, Parallel Comput..
[4] Jesús Labarta,et al. A high‐productivity task‐based programming model for clusters , 2012, Concurr. Comput. Pract. Exp..
[5] Daniel S. Katz,et al. Swift/T: Large-Scale Application Composition via Distributed-Memory Dataflow Processing , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[6] James Demmel,et al. ScaLAPACK: A Linear Algebra Library for Message-Passing Computers , 1997, PPSC.
[7] Lars Karlsson,et al. Distributed SBP Cholesky factorization algorithms with near-optimal scheduling , 2009, TOMS.
[8] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[9] Michael Garland,et al. Designing a unified programming model for heterogeneous machines , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Emmanuel Agullo,et al. Harnessing clusters of hybrid nodes with a sequential task-based programming model , 2014 .
[11] Thorsten Kurth,et al. MPI usage at NERSC: Present and Future , 2016, EuroMPI.
[12] Vivek Sarkar,et al. Data-Driven Tasks and Their Implementation , 2011, 2011 International Conference on Parallel Processing.
[13] Alexander Aiken,et al. Regent: a high-productivity programming language for HPC with logical regions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Hartmut Kaiser,et al. HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.
[15] Dan Bonachea. GASNet Specification, v1.1 , 2002 .
[16] Vivek Sarkar,et al. Integrating Asynchronous Task Parallelism with MPI , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[17] Katherine A. Yelick,et al. A Local-View Array Library for Partitioned Global Address Space C++ Programs , 2014, ARRAY@PLDI.
[18] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[19] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[20] Jens Breitbart,et al. A dataflow-like programming model for future hybrid clusters , 2013, Int. J. Netw. Comput..
[21] Eduard Ayguadé,et al. Implementing OmpSs support for regions of data in architectures with multiple address spaces , 2013, ICS '13.
[22] Basilio B. Fraguela,et al. A framework for argument-based task synchronization with automatic detection of dependencies , 2013, Parallel Comput..
[23] Katherine A. Yelick,et al. UPC++: A PGAS Extension for C++ , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[24] Juan Touriño,et al. Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures , 2009, PVM/MPI.
[25] Scott B. Baden,et al. The UPC++ PGAS library for Exascale Computing , 2017, PAW@SC.
[26] Sreedhar B. Kodali,et al. The Asynchronous Partitioned Global Address Space Model , 2010 .
[27] James Demmel,et al. the Parallel Computing Landscape , 2022 .
[28] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[29] George Bosilca,et al. PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution , 2015, 2015 IEEE International Conference on Cluster Computing.
[30] Basilio B. Fraguela,et al. A Comparison of Task Parallel Frameworks based on Implicit Dependencies in Multi-core Environments , 2017, HICSS.
[31] Katherine A. Yelick,et al. Multi-threading and one-sided communication in parallel LU factorization , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[32] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[33] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[34] Thomas Hérault,et al. Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[35] Jarek Nieplocha,et al. Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit , 2006, Int. J. High Perform. Comput. Appl..
[36] Michel Cosnard,et al. Automatic task graph generation techniques , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.
[37] James Reinders,et al. Intel® threading building blocks , 2008 .
[38] Katherine Yelick,et al. UPC Language Specifications V1.1.1 , 2003 .