High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn
暂无分享,去创建一个
[1] Jesús Labarta,et al. A high‐productivity task‐based programming model for clusters , 2012, Concurr. Comput. Pract. Exp..
[2] Basilio B. Fraguela,et al. A framework for argument-based task synchronization with automatic detection of dependencies , 2013, Parallel Comput..
[3] Katherine Yelick,et al. UPC Language Specifications V1.1.1 , 2003 .
[4] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[5] Jarek Nieplocha,et al. Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit , 2006, Int. J. High Perform. Comput. Appl..
[6] Thomas Hérault,et al. Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[7] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[8] Basilio B. Fraguela,et al. A Comparison of Task Parallel Frameworks based on Implicit Dependencies in Multi-core Environments , 2017, HICSS.
[9] Scott B. Baden,et al. UPC++: A High-Performance Communication Framework for Asynchronous Computation , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[10] Basilio B. Fraguela,et al. Easy Dataflow Programming in Clusters with UPC++ DepSpawn , 2019, IEEE Transactions on Parallel and Distributed Systems.
[11] Emmanuel Agullo,et al. Harnessing clusters of hybrid nodes with a sequential task-based programming model , 2014 .
[12] Thorsten Kurth,et al. MPI usage at NERSC: Present and Future , 2016, EuroMPI.
[13] Katherine A. Yelick,et al. UPC++: A PGAS Extension for C++ , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[14] Daniel S. Katz,et al. Swift/T: Large-Scale Application Composition via Distributed-Memory Dataflow Processing , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[15] Eduard Ayguadé,et al. Implementing OmpSs support for regions of data in architectures with multiple address spaces , 2013, ICS '13.
[16] George Bosilca,et al. PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution , 2015, 2015 IEEE International Conference on Cluster Computing.
[17] William Pugh,et al. The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[18] Jason Duell,et al. Productivity and performance using partitioned global address space languages , 2007, PASCO '07.
[19] Alexander Aiken,et al. Regent: a high-productivity programming language for HPC with logical regions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Michel Cosnard,et al. Proceedings of the 28th Annual Hawaii International Conference on System Sciences- 1995 Automatic Task Graph Generation Techniques , 2022 .
[21] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[22] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[23] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[24] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.