Adaptive optimization for OpenCL programs on embedded heterogeneous systems
暂无分享,去创建一个
[1] Michael F. P. O'Boyle,et al. Partitioning streaming parallelism for multi-cores: A machine learning based approach , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[2] Michael F. P. O'Boyle,et al. Smart, adaptive mapping of parallelism in the presence of external workload , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[3] Aaron Smith,et al. A machine learning approach to mapping streaming workloads to dynamic multicore processors , 2016, LCTES.
[4] Christopher C. Cummins,et al. Synthesizing benchmarks for predictive modeling , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[5] Sherief Reda,et al. Scheduling challenges and opportunities in integrated CPU+GPU processors , 2016, 2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia).
[6] José A. Martínez,et al. An approach to optimise the energy efficiency of iterative computation on integrated GPU–CPU systems , 2016, The Journal of Supercomputing.
[7] Michael F. P. O'Boyle,et al. Integrating profile-driven parallelism detection and machine-learning-based mapping , 2014, TACO.
[8] Margaret Martonosi,et al. GPU Performance and Power Tuning Using Regression Trees , 2015, TACO.
[9] Stijn Eyerman,et al. Probabilistic job symbiosis modeling for SMT processor scheduling , 2010, ASPLOS XV.
[10] Jarmo Takala,et al. pocl: A Performance-Portable OpenCL Implementation , 2014, International Journal of Parallel Programming.
[11] Michael F. P. O'Boyle,et al. Partitioning data-parallel programs for heterogeneous MPSoCs: time and energy design space exploration , 2014, LCTES '14.
[12] Pavlos Petoumenos,et al. Minimizing the cost of iterative compilation with active learning , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[13] Alan Burns,et al. Reducing the Implementation Overheads of IPCP and DFP , 2015, 2015 IEEE Real-Time Systems Symposium.
[14] Michael F. P. O'Boyle,et al. Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems , 2014, ACM Trans. Archit. Code Optim..
[15] Soheil Ghiasi,et al. CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android , 2015, ACM Multimedia.
[16] Amit Kumar Singh,et al. Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[17] Yunheung Paek,et al. Energy-Reduction Offloading Technique for Streaming Media Servers , 2016, Mob. Inf. Syst..
[18] Vivek Sarkar,et al. Automatic data layout generation and kernel mapping for CPU+GPU architectures , 2016, CC.
[19] Wei Chen,et al. GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures , 2012, 2012 41st International Conference on Parallel Processing.
[20] Ge Yu,et al. Schedulability analysis of preemptive and nonpreemptive EDF on partial runtime-reconfigurable FPGAs , 2008, TODE.
[21] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[22] Dongyoung Kim,et al. Zero and data reuse-aware fast convolution for deep neural networks on GPU , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[23] Shashank Shekhar,et al. Opportunity for compute partitioning in pursuit of energy-efficient systems , 2016, LCTES.
[24] Rainer Leupers,et al. MAPS: An integrated framework for MPSoC application parallelization , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[25] A. Snavely,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.
[26] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[27] Yue Zhao,et al. EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU , 2017, PPoPP.
[28] Scott A. Mahlke,et al. Orchestrating Multiple Data-Parallel Kernels on Multiple Devices , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[29] Michael F. P. O'Boyle,et al. Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[30] Keshav Pingali,et al. Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[31] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[32] Yunheung Paek,et al. Accelerating bootstrapping in FHEW using GPUs , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[33] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[34] Michael F. P. O'Boyle,et al. Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.
[35] Henry Hoffmann,et al. Bard: A unified framework for managing soft timing and power constraints , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).
[36] Michael F. P. O'Boyle,et al. A workload-aware mapping approach for data-parallel programs , 2011, HiPEAC.
[37] Ling Gao,et al. Optimise web browsing on heterogeneous mobile platforms: A machine learning based approach , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.
[38] Michael F. P. O'Boyle,et al. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.
[39] Michael F. P. O'Boyle,et al. Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms , 2014, 2014 21st International Conference on High Performance Computing (HiPC).
[40] R. Govindarajan,et al. Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices , 2014, CGO '14.
[41] Michael F. P. O'Boyle,et al. OpenCL Task Partitioning in the Presence of GPU Contention , 2013, LCPC.
[42] Michael F. P. O'Boyle,et al. Automatic optimization of thread-coarsening for graphics processors , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).