Emerald: Graphics Modeling for SoC Systems
暂无分享,去创建一个
[1] David Black-Schaffer,et al. A graphics tracing framework for exploring CPU+GPU memory systems , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).
[2] Kevin Skadron,et al. A flexible simulation framework for graphics architectures , 2004, Graphics Hardware.
[3] Onur Mutlu,et al. The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express Data Locality In GPUs , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[4] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[5] Mahmut T. Kandemir,et al. Domain knowledge based energy management in handhelds , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[6] Mor Harchol-Balter,et al. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[7] Henk Corporaal,et al. Locality-Aware CTA Clustering for Modern GPUs , 2017, ASPLOS.
[8] Ronald G. Dreslinski,et al. Sources of error in full-system simulation , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[9] Kevin Kai-Wei Chang,et al. DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators , 2016, ACM Trans. Archit. Code Optim..
[10] David R. Kaeli,et al. Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[11] Jem Davies. The bifrost GPU architecture and the ARM Mali-G71 GPU , 2016, 2016 IEEE Hot Chips 28 Symposium (HCS).
[12] John Kim,et al. Improving GPGPU resource utilization through alternative thread block scheduling , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[13] Mattan Erez,et al. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC , 2012, DAC Design Automation Conference 2012.
[14] Stijn Eyerman,et al. An Evaluation of High-Level Mechanistic Core Models , 2014, ACM Trans. Archit. Code Optim..
[15] Jose-Maria Arnau,et al. TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems , 2013, ICS '13.
[16] Thomas F. Wenisch,et al. Simulating DRAM controllers for future system architecture exploration , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[17] Mahmut T. Kandemir,et al. VIP: Virtualizing IP chains on handheld platforms , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[18] David A. Wood,et al. gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.
[19] Lei Yang,et al. Temporal Coherence Methods in Real‐Time Rendering , 2012, Comput. Graph. Forum.
[20] Mainak Chaudhuri,et al. Improving CPU Performance Through Dynamic GPU Access Throttling in CPU-GPU Heterogeneous Processors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[21] Jose-Maria Arnau,et al. Parallel frame rendering: Trading responsiveness for energy on a mobile GPU , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[22] Juan L. Aragón,et al. Early Visibility Resolution for Removing Ineffectual Computations in the Graphics Pipeline , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[23] Mark Segal,et al. The OpenGL Graphics System: A Specification , 2004 .
[24] Andreas Sandberg,et al. NoMali: Simulating a realistic graphics driver stack using a stub GPU , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[25] Yuan Yao,et al. Aggregate Flow-Based Performance Fairness in CMPs , 2016, ACM Trans. Archit. Code Optim..
[26] Mahmut T. Kandemir,et al. GemDroid: a framework to evaluate mobile platforms , 2014, SIGMETRICS '14.
[27] Rami G. Melhem,et al. Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[28] Mahmut T. Kandemir,et al. Exploiting Core Criticality for Enhanced GPU Performance , 2016, SIGMETRICS.
[29] Mahmut T. Kandemir,et al. Short-Circuiting Memory Traffic in Handheld Platforms , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[30] Jose-Maria Arnau,et al. Eliminating redundant fragment shader executions on a mobile GPU via hardware memoization , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[31] Antonio González,et al. Visibility Rendering Order: Improving Energy Efficiency on Mobile GPUs through Frame Coherence , 2019, IEEE Transactions on Parallel and Distributed Systems.
[32] Onur Mutlu,et al. The Blacklisting Memory Scheduler: Achieving high performance and fairness at low cost , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).
[33] Carlos González,et al. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[34] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[35] Gu-Yeon Wei,et al. Co-designing accelerators and SoC interfaces using gem5-Aladdin , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[36] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[37] Kevin Kai-Wei Chang,et al. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[38] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[39] Mor Harchol-Balter,et al. ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .
[40] Cheol Hong Kim,et al. A dynamic CTA scheduling scheme for massive parallel computing , 2017, Cluster Computing.