Warp-Based Load/Store Reordering to Improve GPU Time Predictability
暂无分享,去创建一个
[1] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[2] John D. Owens,et al. GPU Computing , 2008, Proceedings of the IEEE.
[3] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.
[4] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[5] Damien Hardy,et al. WCET Analysis of Multi-level Non-inclusive Set-Associative Instruction Caches , 2008, 2008 Real-Time Systems Symposium.
[6] Björn Lisper,et al. Data cache locking for higher program predictability , 2003, SIGMETRICS '03.
[7] Björn Andersson,et al. Assigning real-time tasks on heterogeneous multiprocessors with two unrelated types of processors , 2010, 2010 31st IEEE Real-Time Systems Symposium.
[8] James H. Anderson,et al. Globally scheduled real-time multiprocessor systems with GPUs , 2011, Real-Time Systems.
[9] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[10] Eduardo Tovar,et al. WCET Measurement-based and Extreme Value Theory Characterisation of CUDA Kernels , 2014, RTNS.
[11] Wei Zhang,et al. WCET Analysis for Multi-Core Processors with Shared L2 Instruction Caches , 2008, 2008 IEEE Real-Time and Embedded Technology and Applications Symposium.
[12] Martin Schoeberl,et al. A Time Predictable Instruction Cache for a Java Processor , 2004, OTM Workshops.
[13] Peter Marwedel,et al. Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).
[14] Kevin Skadron,et al. Accelerating Compute-Intensive Applications with GPUs and FPGAs , 2008, 2008 Symposium on Application Specific Processors.
[15] Abhik Roychoudhury,et al. Scope-Aware Data Cache Analysis for WCET Estimation , 2011, 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium.
[16] Assaf Schuster,et al. Processing data streams with hard real-time constraints on heterogeneous systems , 2011, ICS '11.
[17] John D. Owens,et al. Real-time parallel hashing on the GPU , 2009, SIGGRAPH 2009.
[18] Margaret Martonosi,et al. MRPB: Memory request prioritization for massively parallel processors , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[19] Yun Liang,et al. WCET-centric partial instruction cache locking , 2012, DAC Design Automation Conference 2012.
[20] Reinhard Wilhelm,et al. Cache Behavior Prediction by Abstract Interpretation , 1996, Sci. Comput. Program..
[21] Adam Betts,et al. Estimating the WCET of GPU-Accelerated Applications Using Hybrid Analysis , 2013, 2013 25th Euromicro Conference on Real-Time Systems.
[22] Yun Liang,et al. Timing analysis of concurrent programs running on shared cache multi-cores , 2009, 2009 30th IEEE Real-Time Systems Symposium.
[23] Yun Liang,et al. An efficient compiler framework for cache bypassing on GPUs , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).