FusionCL: a machine-learning based approach for OpenCL kernel fusion to increase system performance
暂无分享,去创建一个
Muhammad Arshad Islam | Muhammad Aleem | Muhammad Azhar Iqbal | Usman Ahmed | Yasir Noman Khalid | Radu Prodan | Usman Ahmed | Y. Khalid | R.-C. Prodan | Muhammad Aleem | M. Iqbal
[1] Michael F. P. O'Boyle,et al. Portable and transparent software managed scheduling on accelerators for fair resource sharing , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[2] Yuan Wen. Multi-tasking scheduling for heterogeneous systems , 2017 .
[3] J. Friedman. Stochastic gradient boosting , 2002 .
[4] Michael F. P. O'Boyle,et al. Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms , 2014, 2014 21st International Conference on High Performance Computing (HiPC).
[5] Daniel J. Sorin,et al. Exploring memory consistency for massively-threaded throughput-oriented processors , 2013, ISCA.
[6] Yifan Sun,et al. Valkyrie: Leveraging Inter-TLB Locality to Enhance GPU Performance , 2020, PACT.
[7] Kevin Skadron,et al. Dynamic Heterogeneous Scheduling Decisions Using Historical Runtime Data , 2011 .
[8] Neil C. Thompson,et al. The decline of computers as a general purpose technology , 2021, Commun. ACM.
[9] Mahmut T. Kandemir,et al. Anatomy of GPU Memory System for Multi-Application Execution , 2015, MEMSYS.
[10] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[11] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[12] Wei Jiang,et al. Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).
[13] Schahram Dustdar,et al. Optimized container scheduling for data-intensive serverless edge computing , 2021, Future Gener. Comput. Syst..
[14] Volker Lindenstruth,et al. An Energy-Efficient Multi-GPU Supercomputer , 2014, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS).
[15] Xiaokang Yang,et al. GPU accelerated high-quality video/image super-resolution , 2016, 2016 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB).
[16] Michail Papadimitriou,et al. Multiple-tasks on multiple-devices (MTMD): exploiting concurrency in heterogeneous managed runtimes , 2021, VEE.
[17] Jianlong Zhong,et al. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.
[18] Tulika Mitra,et al. Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[19] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[20] Zhongliang Chen,et al. MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[21] Thomas Fahringer,et al. An automatic input-sensitive approach for heterogeneous task partitioning , 2013, ICS '13.
[22] Michael F. P. O'Boyle,et al. MaxPair: Enhance OpenCL Concurrent Kernel Execution by Weighted Maximum Matching , 2018, GPGPU@PPoPP.
[23] Guojie Luo,et al. Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion , 2017, FPGA.
[24] Michael F. P. O'Boyle,et al. Merge or Separate?: Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms , 2017, GPGPU@PPoPP.
[25] Muhammad Arshad Islam,et al. RALB‐HC: A resource‐aware load balancer for heterogeneous cluster , 2019 .
[26] Andreas Kopmann,et al. Balancing Load of GPU Subsystems to Accelerate Image Reconstruction in Parallel Beam Tomography , 2018, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[27] Gérard Biau,et al. Accelerated gradient boosting , 2018, Machine Learning.
[28] Duksu Kim,et al. HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs , 2020, Computing.
[29] Sachin Singh Gautam,et al. GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices , 2020, Computing.
[30] Carole-Jean Wu,et al. Performance characterization, prediction, and optimization for heterogeneous systems with multi-level memory interference , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).
[31] Rachata Ausavarungnirun,et al. MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency , 2018, ASPLOS.
[32] Keshav Pingali,et al. Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[33] Randal S. Olson,et al. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science , 2016, GECCO.
[34] Jong-Myon Kim,et al. An efficient scheduling scheme using estimated execution time for heterogeneous computing systems , 2013, The Journal of Supercomputing.
[35] Muhammad Arshad Islam,et al. Troodon: A machine-learning based load-balancing application scheduler for CPU-GPU system , 2019, J. Parallel Distributed Comput..
[36] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[37] Laxmi N. Bhuyan,et al. CuMAS: Data Transfer Aware Multi-Application Scheduling for Shared GPUs , 2016, ICS.
[38] Radu Prodan,et al. E-OSched: a load balancing scheduler for heterogeneous multicores , 2018, The Journal of Supercomputing.
[39] Kevin Skadron,et al. Load balancing in a changing world: dealing with heterogeneity and performance variability , 2013, CF '13.
[40] Ramón Beivide,et al. Simplifying programming and load balancing of data parallel applications on heterogeneous systems , 2016, GPGPU@PPoPP.
[41] Michael F. P. O'Boyle,et al. A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.