Power-efficient computing for compute-intensive GPGPU applications
暂无分享,去创建一个
[1] Michael J. Schulte,et al. Memory latency consideration for load sharing on heterogeneous network of workstations , 2006 .
[2] Margaret Martonosi,et al. Value-based clock gating and operation packing: dynamic strategies for improving processor power and performance , 2000, TOCS.
[3] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.
[4] Mikko H. Lipasti,et al. Macro-op scheduling: relaxing scheduling loop constraints , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[5] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[6] Kanad Ghose,et al. Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[7] Mark Horowitz,et al. Energy-Efficient Floating-Point Unit Design , 2011, IEEE Transactions on Computers.
[8] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[9] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[10] William J. Dally,et al. Energy-efficient mechanisms for managing thread context in throughput processors , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[11] Nam Sung Kim,et al. Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[12] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[13] William J. Dally,et al. A compile-time managed multi-level register file hierarchy , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Li Shen,et al. A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).
[15] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[16] Michael J. Schulte,et al. ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing , 2010, 2010 International Conference on Field Programmable Logic and Applications.