WCET analysis of the shared data cache in integrated CPU-GPU architectures
暂无分享,去创建一个
[1] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[2] Kristof Beyls,et al. Reuse Distance as a Metric for Cache Behavior. , 2001 .
[3] Mitsuhisa Sato,et al. GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing , 2012, 2012 41st International Conference on Parallel Processing Workshops.
[4] Thomas Fahringer,et al. An automatic input-sensitive approach for heterogeneous task partitioning , 2013, ICS '13.
[5] Yutao Zhong,et al. Predicting whole-program locality through reuse distance analysis , 2003, PLDI '03.
[6] Adam Betts,et al. Estimating the WCET of GPU-Accelerated Applications Using Hybrid Analysis , 2013, 2013 25th Euromicro Conference on Real-Time Systems.
[7] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[8] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[9] David A. Wood,et al. gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.
[10] Chao Yang,et al. A peta-scalable CPU-GPU algorithm for global atmospheric simulations , 2013, PPoPP '13.
[11] Marco Caccamo,et al. Real-time cache management framework for multi-core architectures , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).
[12] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[13] Javier Cuenca,et al. Optimization Techniques for 3D-FWT on Systems with Manycore GPUs and Multicore CPUs , 2013, ICCS.
[14] Francisco J. Cazorla,et al. Hardware support for WCET analysis of hard real-time multicore systems , 2009, ISCA '09.
[15] Tulika Mitra,et al. Modeling shared cache and bus in multi-cores for timing analysis , 2010, SCOPES.
[16] Dong Li,et al. The tradeoffs of fused memory hierarchies in heterogeneous computing architectures , 2012, CF '12.
[17] David R. Kaeli,et al. Quantifying the energy efficiency of FFT on heterogeneous platforms , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[18] Lui Sha,et al. Real-Time Computing on Multicore Processors , 2016, Computer.
[19] Laxmi N. Bhuyan,et al. A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.
[20] James H. Anderson,et al. Outstanding Paper Award: Making Shared Caches More Predictable on Multicore Platforms , 2013, 2013 25th Euromicro Conference on Real-Time Systems.
[21] Eduardo Tovar,et al. WCET Measurement-based and Extreme Value Theory Characterisation of CUDA Kernels , 2014, RTNS.
[22] Tomasz P. Stefanski. Implementation of FDTD-Compatible Green's Function on Heterogeneous Cpu-GPU Parallel Processing System , 2013 .
[23] Wei Jiang,et al. MATE-CG: A Map Reduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[24] Anand Raghunathan,et al. Automatic generation of software pipelines for heterogeneous parallel systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] Wu-chun Feng,et al. On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.
[26] Wei Zhang,et al. Static WCET Analysis of GPUs with Predictable Warp Scheduling , 2017, 2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC).