Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
暂无分享,去创建一个
[1] M. Smelyanskiy,et al. Stack value file: custom microarchitecture for the stack , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[2] P. Stenstrom,et al. TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors , 2002, Proceedings of the International Symposium on Low Power Electronics and Design.
[3] Mark S. Squillante,et al. Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..
[4] Simha Sethumadhavan,et al. Late-binding: enabling unordered load-store queues , 2007, ISCA '07.
[5] Mikko H. Lipasti,et al. Improving multiprocessor performance with coarse-grain coherence tracking , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[6] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[7] Ronak Singhal,et al. Performance Analysis and Validation of the Intel Pentium 4 Processor on 90nm Technology , 2004 .
[8] Irith Pomeranz,et al. Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[9] Greg Hamerly,et al. SimPoint 3.0: Faster and More Flexible Program Analysis , 2005 .
[10] Babak Falsafi,et al. JETTY: filtering snoops for reduced energy consumption in SMP servers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[11] Hsien-Hsin S. Lee,et al. An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[12] Andreas Moshovos. RegionScout: exploiting coarse grain sharing in snoop-based coherence , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[13] Per Stenström,et al. Evaluation of Snoop-Energy Reduction Techniques for Chip-Multiprocessors , 2002, ISCA 2002.
[14] D. Novillo. OpenMP and automatic parallelization in GCC Diego , 2006 .
[15] Eric Dahlen,et al. The 82460GX Sever/Workstation Chip Set , 2000, IEEE Micro.
[16] Peter Petrov,et al. Energy-Efficient Cache Coherence for Embedded Multi-Processor Systems through Application-Driven Snoop Filtering , 2006, 9th EUROMICRO Conference on Digital System Design (DSD'06).
[17] K. Sundaramoorthy,et al. Slipstream processors: improving both performance and fault tolerance , 2000, SIGP.
[18] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.
[19] Mikko H. Lipasti,et al. Power-Efficient Cache Coherence , 2004 .
[20] Wen-Hann Wang,et al. On the inclusion properties for multi-level cache hierarchies , 1988, ISCA '88.
[21] William J. Dally,et al. Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.
[22] Gary S. Tyson,et al. Region-based caching: an energy-delay efficient memory architecture for embedded processors , 2000, CASES '00.
[23] Hsien-Hsin S. Lee,et al. Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning , 2003, ISLPED '03.
[24] Shreekant S. Thakkar,et al. Multiprocessor validation of the Pentium Pro microprocessor , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.
[25] Xin-Min Tian,et al. Intel OpenMP C++/Fortran Compiler for Hyper-Threading Technology: Implementation and Performance , 2002 .
[26] Amir Roth,et al. Store vulnerability window (SVW): re-execution filtering for enhanced load optimization , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[27] Avi Mendelson,et al. CMP Implementation in Systems Based on the Intel Core Duo Processor , 2006 .
[28] Alon Naveh,et al. Power and Thermal Management in the Intel Core Duo Processor , 2006 .
[29] Brad Calder,et al. SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.
[30] Hsien-Hsin S. Lee,et al. Efficient System-on-Chip Energy Management with a Segmented Bloom Filter , 2006, ARCS.
[31] Michael Stumm,et al. A performance comparison of hierarchical ring- and mesh-connected multiprocessor networks , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.