Chip multiprocessors for server workloads
暂无分享,去创建一个
Babak Falsafi | Nikolaos Hardavellas | Anastasia Ailamaki | A. Ailamaki | B. Falsafi | N. Hardavellas | Nikolaos Hardavellas
[1] William J. Dally,et al. Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.
[2] John Goodacre,et al. ARM MPCore; The streamlined and scalable ARM11 processor core , 2007, 2007 Asia and South Pacific Design Automation Conference.
[3] T. N. Vijaykumar,et al. Distance associativity for high-performance energy-efficient non-uniform cache architectures , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[4] Philip G. Emma,et al. Cache miss behavior: is it sqrt(2)? , 2006 .
[5] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[6] Luiz André Barroso,et al. Memory system characterization of commercial workloads , 1998, ISCA.
[7] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[8] Kevin Skadron,et al. CMP design space exploration subject to physical constraints , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[9] Seth Copen Goldstein,et al. Spatial computation , 2004, ASPLOS XI.
[10] Wolf-Dietrich Weber,et al. Power provisioning for a warehouse-sized computer , 2007, ISCA '07.
[11] T. N. Vijaykumar,et al. Optimizing Replication, Communication, and Capacity Allocation in CMPs , 2005, ISCA 2005.
[12] Thomas F. Wenisch,et al. Store-ordered streaming of shared memory , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[13] Marcelo Cintra,et al. An OS-based alternative to full hardware coherence on tiled CMPs , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[14] Sarita V. Adve,et al. Performance of database workloads on shared-memory systems with out-of-order processors , 1998, ASPLOS VIII.
[15] Ann Marie Grizzaffi Maynard,et al. Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.
[16] R.W. Brodersen,et al. A dynamic voltage scaled microprocessor system , 2000, IEEE Journal of Solid-State Circuits.
[17] Anastasia Ailamaki,et al. A Case for Staged Database Systems , 2003, CIDR.
[18] S. Parekh,et al. An analysis of database workload performance on simultaneous multithreaded processors , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[19] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[20] Thomas Skotnicki,et al. Materials and device structures for sub-32 nm CMOS nodes , 2007 .
[21] David J. DeWitt,et al. DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.
[22] David J. DeWitt,et al. Weaving Relations for Cache Performance , 2001, VLDB.
[23] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008 .
[24] Dirk Grunwald,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[25] T. Sherwood,et al. Predictor-directed stream buffers , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[26] Anastasia Ailamaki,et al. QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.
[27] Mahmut T. Kandemir,et al. A novel migration-based NUCA design for Chip Multiprocessors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[28] David A. Wood,et al. ASR: Adaptive Selective Replication for CMP Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[29] Priyadarsan Patra,et al. Impact of Process and Temperature Variations on Network-on-Chip Design Exploration , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).
[30] Josep Torrellas,et al. Reducing remote conflict misses: NUMA with remote cache versus COMA , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[31] William J. Dally,et al. Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.
[32] Jaehyuk Huh,et al. Exploring the design space of future CMPs , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[33] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[34] James R. Larus,et al. Spending Moore's dividend , 2009, CACM.
[35] Todd C. Mowry,et al. Automatic Compiler-Inserted Prefetching for Pointer-Based Applications , 1999, IEEE Trans. Computers.
[36] Pradeep Dubey,et al. Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.
[37] Jichuan Chang,et al. Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[38] Thomas F. Wenisch,et al. Temporal streaming of shared memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[39] Thomas F. Wenisch,et al. SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture , 2004, PERV.
[40] Babak Falsafi,et al. DBmbench: fast and accurate database workload representation on modern microarchitecture , 2005, CASCON.
[41] Mani Azimi,et al. Integration Challenges and Tradeoffs for Terascale Architectures , 2007 .
[42] David A. Wood,et al. Using compression to improve chip multiprocessor performance , 2006 .
[43] Li Zhao,et al. Towards hybrid last level caches for chip-multiprocessors , 2008, CARN.
[44] Josep Torrellas,et al. BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.
[45] Kathryn S. McKinley,et al. Guided region prefetching: a cooperative hardware/software approach , 2003, ISCA '03.
[46] James E. Smith,et al. Data Cache Prefetching Using a Global History Buffer , 2005, IEEE Micro.
[47] William J. Dally,et al. The torus routing chip , 2005, Distributed Computing.
[48] Shekhar Y. Borkar,et al. Microarchitecture and Design Challenges for Gigascale Integration , 2004, MICRO.
[49] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[50] Ching-Te Chuang,et al. Device Footprint Scaling for Ultra Thin Body Fully Depleted SOI , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).
[51] Thomas F. Wenisch,et al. Mechanisms for store-wait-free multiprocessors , 2007, ISCA '07.
[52] C. Morganti,et al. The asynchronous 24MB on-chip level-3 cache for a dual-core Itanium/sup /spl reg//-family processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..
[53] Josep Torrellas,et al. Using a user-level memory thread for correlation prefetching , 2002, ISCA.
[54] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[55] Uri C. Weiser,et al. Utilizing shared data in chip multiprocessors with the Nahalal architecture , 2008, SPAA '08.
[56] Anoop Gupta,et al. Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.
[57] T. Mowry,et al. Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[58] W. Dally,et al. Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[59] Santosh G. Abraham,et al. Store memory-level parallelism optimizations for commercial applications , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[60] Kunle Olukotun,et al. Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[61] Martin Hirzel,et al. Dynamic hot data stream prefetching for general-purpose programs , 2002, PLDI '02.
[62] Valentin Puente,et al. SP-NUCA: a cost effective dynamic non-uniform cache architecture , 2008, CARN.
[63] Hui Chen,et al. Electrical and optical on-chip interconnects in scaled microprocessors , 2005, 2005 IEEE International Symposium on Circuits and Systems.
[64] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[65] Andreas Moshovos,et al. Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.
[66] Balaram Sinharoy,et al. IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.
[67] Babak Falsafi,et al. Database Servers on Chip Multiprocessors: Limitations and Opportunities , 2007, CIDR.
[68] David K. Tam,et al. Managing Shared L2 Caches on Multicore Systems in Software , 2007 .
[69] Hyunjin Lee,et al. A flexible data to L2 cache mapping approach for future multicore processors , 2006, MSPC '06.
[70] Thomas F. Wenisch,et al. Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[71] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.
[72] Michael Zhang,et al. Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches , 2005 .
[73] Stefanos Kaxiras,et al. Improving CC-NUMA performance using Instruction-based Prediction , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[74] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[75] Jaehyuk Huh,et al. A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.
[76] Norman P. Jouppi,et al. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[77] G. Sohi,et al. Effective jump-pointer prefetching for linked data structures , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[78] Thomas F. Wenisch,et al. Memory coherence activity prediction in commercial workloads , 2004, WMPI '04.
[79] Mahmut T. Kandemir,et al. Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[80] Mark D. Hill,et al. Virtual hierarchies to support server consolidation , 2007, ISCA '07.
[81] Priyadarsan Patra,et al. Impact of Process and Temperature Variations on Network-on-Chip Design Exploration , 2008 .
[82] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.
[83] Brad Calder,et al. Reducing cache misses using hardware and software page placement , 1999, ICS '99.
[84] D.A. Wood,et al. Reactive NUMA: A Design For Unifying S-COMA And CC-NUMA , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[85] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[86] Per Stenström,et al. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[87] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[88] Margaret Martonosi,et al. TCP: tag correlating prefetchers , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[89] S. Tam,et al. A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.
[90] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[91] Bruce Jacob,et al. Energy/Power Breakdown of Pipelined Nanometer Caches (90nm/65nm/45nm/32nm) , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.
[92] Glenn Reinman,et al. Fast and fair: data-stream quality of service , 2005, CASES '05.
[93] Thomas F. Wenisch,et al. SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.
[94] Hao Hua,et al. Performance Trend in Three-Dimensional Integrated Circuits , 2006, 2006 International Interconnect Technology Conference.