Power-efficient computing for compute-intensive GPGPU applications
暂无分享,去创建一个
[1] Edward T. Grochowski,et al. Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[2] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[3] Michael J. Schulte,et al. Dual-mode floating-point multiplier architectures with parallel operations , 2006, J. Syst. Archit..
[4] William J. Dally,et al. Energy-efficient mechanisms for managing thread context in throughput processors , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[5] Mikko H. Lipasti,et al. Macro-op Scheduling: Relaxing Scheduling Loop Constraints , 2003, MICRO.
[6] Mark Horowitz,et al. Energy-Efficient Floating-Point Unit Design , 2011, IEEE Transactions on Computers.
[7] William J. Dally,et al. A compile-time managed multi-level register file hierarchy , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[8] Li Shen,et al. A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).
[9] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[10] Nam Sung Kim,et al. Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[11] Kanad Ghose,et al. Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[12] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[13] Michael J. Schulte,et al. ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing , 2010, 2010 International Conference on Field Programmable Logic and Applications.
[14] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[15] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.
[16] David Harris,et al. CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .
[17] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[18] Margaret Martonosi,et al. Value-based clock gating and operation packing: dynamic strategies for improving processor power and performance , 2000, TOCS.