Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System
暂无分享,去创建一个
Tsung-Wei Huang | Yibo Lin | Chun-Xun Lin | Dian-Lun Lin | Tsung-Wei Huang | Yibo Lin | Dian-Lun Lin | Chun-Xun Lin
[1] Martin D. F. Wong,et al. GPU-accelerated Path-based Timing Analysis , 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC).
[2] Bruno Raffin,et al. Design and analysis of scheduling strategies for multi-CPU and multi-GPU architectures , 2015, Parallel Comput..
[3] Bruno Raffin,et al. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[4] Quan Chen,et al. LAWS: locality-aware work-stealing for multi-socket multi-core architectures , 2014, ICS '14.
[5] Yuxiong He,et al. Adaptive work-stealing with parallelism feedback , 2008, TOCS.
[6] Vivek Sarkar,et al. A scalable locality-aware adaptive work-stealing scheduler for multi-core task parallelism , 2010 .
[7] Keshav Pingali,et al. Can Parallel Programming Revolutionize EDA Tools? , 2018, Advanced Logic Synthesis.
[8] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[9] Marco Danelutto,et al. FastFlow: High-level and Efficient Streaming on Multi-core , 2017 .
[10] Martin D. F. Wong,et al. OpenTimer v2: A New Parallel Incremental Timing Analysis Engine , 2021, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[11] Charles E. Leiserson,et al. Executing task graphs using work-stealing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[12] Nan Sun,et al. MAGICAL: Toward Fully Automated Analog IC Layout Leveraging Human and Machine Intelligence: Invited Paper , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[13] Tsung-Wei Huang,et al. GPU-Accelerated Static Timing Analysis , 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).
[14] Jürgen Teich,et al. The Best of Both Worlds: Combining CUDA Graph with an Image Processing DSL , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).
[15] Martin D. F. Wong,et al. An Efficient Work-Stealing Scheduler for Task Dependency Graph , 2020, 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS).
[16] Sriram Krishnamoorthy,et al. Lifeline-based global load balancing , 2011, PPoPP '11.
[17] Martín Abadi,et al. Dynamic control flow in large-scale machine learning , 2018, EuroSys.
[18] Dieter Schmalstieg,et al. Whippletree , 2014, ACM Trans. Graph..
[19] D. F. Wong,et al. Simulated Annealing for VLSI Design , 1988 .
[20] Jin Hu,et al. TAU 2015 contest on incremental timing analysis , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[21] Sriram Krishnamoorthy,et al. Scalable work stealing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[22] Martin D. F. Wong,et al. Cpp-Taskflow: Fast Task-Based Parallel Programming Using Modern C++ , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[23] Albert Cohen,et al. Correct and efficient work-stealing for weak memory models , 2013, PPoPP '13.
[24] Charles E. Leiserson,et al. On the efficiency of localized work stealing , 2016, Inf. Process. Lett..
[25] Eduardo Quiñones,et al. OpenMP to CUDA graphs: a compiler-based transformation to enhance the programmability of NVIDIA devices , 2020, SCOPES.
[26] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[27] Daniel Sunderland,et al. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..
[28] Wolfram Schulte,et al. The design of a task parallel library , 2009, OOPSLA '09.
[29] Jeremy Kepner,et al. Sparse Deep Neural Network Graph Challenge , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).
[30] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[31] Xiaoning Ding,et al. BWS: balanced work stealing for time-sharing multicores , 2012, EuroSys '12.
[32] Hartmut Kaiser,et al. HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.
[33] Rudolf Eigenmann,et al. OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[34] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[35] Seyong Lee,et al. Early evaluation of directive-based GPU programming models for productive exascale computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Tsung-Wei Huang,et al. Taskflow: A General-Purpose Parallel and Heterogeneous Task Programming System , 2022, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[37] Andrew B. Kahng,et al. INVITED: Toward an Open-Source Digital Flow: First Learnings from the OpenROAD Project , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[38] Yu David Liu,et al. Energy-efficient work-stealing language runtimes , 2014, ASPLOS.
[39] Olivier Tardieu,et al. A work-stealing scheduler for X10's task parallelism with suspension , 2012, PPoPP '12.
[40] Martin D. F. Wong,et al. Cpp-Taskflow: A General-Purpose Parallel Task Programming System at Scale , 2021, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[41] Quan Chen,et al. Bandwidth and Locality Aware Task-stealing for Manycore Architectures with Bandwidth-Asymmetric Memory , 2018, ACM Trans. Archit. Code Optim..
[42] Mauro Bisson,et al. A GPU Implementation of the Sparse Deep Neural Network Graph Challenge , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).
[43] David Z. Pan,et al. ABCDPlace: Accelerated Batch-Based Concurrent Detailed Placement on Multithreaded CPUs and GPUs , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[44] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[45] C. Greg Plaxton,et al. Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.
[46] Thomas Hérault,et al. PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.