Memory Systems and Interconnects for Scale-Out Servers
暂无分享,去创建一个
[1] Yoshiyasu Doi,et al. A 3 Watt 39.8–44.6 Gb/s Dual-Mode SFI5.2 SerDes Chip Set in 65 nm CMOS , 2010, IEEE Journal of Solid-State Circuits.
[2] Krisztián Flautner,et al. Evolution of thread-level parallelism in desktop applications , 2010, ISCA.
[3] Qingyuan Deng,et al. MemScale: active low-power modes for main memory , 2011, ASPLOS XVI.
[4] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] Babak Falsafi,et al. Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Youngmoon Choi,et al. The next-generation 64b SPARC core in a T4 SoC processor , 2012, 2012 IEEE International Solid-State Circuits Conference.
[7] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[8] Thomas F. Wenisch,et al. Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[9] Babak Falsafi,et al. SHIFT: Shared history instruction fetch for lean-core server processors , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[10] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[11] B. Jacob,et al. Buffer-On-Board Memory System , 2012 .
[12] Onur Mutlu,et al. A case for bufferless routing in on-chip networks , 2009, ISCA '09.
[13] Doe Hyun Yoon,et al. The dynamic granularity memory system , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[14] Roland E. Wunderlich,et al. SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[15] Yan Solihin,et al. CHOP: Adaptive filter-based DRAM caching for CMP server platforms , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[16] Gabriel H. Loh,et al. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[17] Wei-Fen Lin,et al. Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[18] Daniel A. Jiménez,et al. Reducing network-on-chip energy consumption through spatial locality speculation , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.
[19] Mikko H. Lipasti,et al. Stealth prefetching , 2006, ASPLOS XII.
[20] Thomas F. Wenisch,et al. SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.
[21] Thomas F. Wenisch,et al. Temporal instruction fetch streaming , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[22] Lizy Kurian John,et al. Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[23] Babak Falsafi,et al. A Case for Specialized Processors for Scale-Out Workloads , 2014, IEEE Micro.
[24] Zhe Wang,et al. Improving writeback efficiency with decoupled last-write prediction , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[25] Sanjeev Kumar,et al. Exploiting spatial locality in data caches using spatial footprints , 1998, ISCA.
[26] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[27] Mounir Meghelli,et al. A 16-Gb/s Backplane Transceiver With 12-Tap Current Integrating DFE and Dynamic Adaptation of Voltage Offset and Timing Drifts in 45-nm SOI CMOS Technology , 2012, IEEE J. Solid State Circuits.
[28] Babak Falsafi,et al. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[29] Giovanni De Micheli,et al. CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.
[30] Lei Jiang,et al. Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[31] Onur Mutlu,et al. Express Cube Topologies for on-Chip Interconnects , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[32] Babak Falsafi,et al. Toward Dark Silicon in Servers , 2011, IEEE Micro.
[33] Kiyoung Choi,et al. Dynamic power management of off-chip links for Hybrid Memory Cubes , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).
[34] Willy Zwaenepoel,et al. X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.
[35] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[36] Bruce Jacob,et al. DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.
[37] Pervez M. Aziz,et al. A 1.0625-to-14.025Gb/s multimedia transceiver with full-rate source-series-terminated transmit driver and floating-tap decision-feedback equalizer in 40nm CMOS , 2011, 2011 IEEE International Solid-State Circuits Conference.
[38] Jung Ho Ahn,et al. Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs , 2009, IEEE Computer Architecture Letters.
[39] Vijayalakshmi Srinivasan,et al. Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.
[40] Mark Horowitz,et al. Rethinking DRAM Power Modes for Energy Proportionality , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[41] Babak Falsafi,et al. BuMP: Bulk Memory Access Prediction and Streaming , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[42] Parag Agrawal,et al. The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.
[43] Mark Horowitz,et al. Scaling, Power and the Future of CMOS , 2007, 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07).
[44] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.
[45] Elad Alon,et al. A wide common-mode fully-adaptive multi-standard 12.5Gb/s backplane transceiver in 28nm CMOS , 2012, 2012 Symposium on VLSI Circuits (VLSIC).
[46] Babak Falsafi,et al. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.
[47] Keiichi Yamamoto,et al. An 8Gb/s Transceiver with 3×-Oversampling 2-Threshold Eye-Tracking CDR Circuit for -36.8dB-loss Backplane , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.
[48] Thomas F. Wenisch,et al. Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.
[49] Andrew B. Kahng,et al. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[50] Thomas Toifl,et al. A 28Gb/s 4-tap FFE/15-tap DFE serial link transceiver in 32nm SOI CMOS technology , 2012, 2012 IEEE International Solid-State Circuits Conference.
[51] Amaresh Malipatil,et al. 2.1 28Gb/s 560mW multi-standard SerDes with single-stage analog front-end and 14-tap decision-feedback equalizer in 28nm CMOS , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[52] Stefanos Kaxiras,et al. Improving CC-NUMA performance using Instruction-based Prediction , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[53] Yoichi Koyanagi,et al. A 4-channel 10.3Gb/s transceiver with adaptive phase equalizer for 4-to-41dB loss PCB channel , 2011, 2011 IEEE International Solid-State Circuits Conference.
[54] John Wu,et al. A 78mW 11.8Gb/s serial link transceiver with adaptive RX equalization and baud-rate CDR in 32nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[55] Hsien-Hsin S. Lee,et al. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[56] Babak Falsafi,et al. Optimizing Data-Center TCO with Scale-Out Processors , 2012, IEEE Micro.
[57] Avinoam Kolodny,et al. Best of both worlds: A bus enhanced NoC (BENoC) , 2010 .
[58] Kai Shen,et al. Virtual Machine Memory Access Tracing with Hypervisor Exclusive Cache , 2007, USENIX Annual Technical Conference.
[59] Luca P. Carloni,et al. Virtual channels vs. multiple physical networks: A comparative analysis , 2010, Design Automation Conference.
[60] J. Jeddeloh,et al. Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).
[61] Martin L. Kersten,et al. The researcher's guide to the data deluge , 2011, Proc. VLDB Endow..
[62] Jichuan Chang,et al. BOOM: Enabling mobile memory based low-power server DIMMs , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[63] G. Tyson,et al. Eager writeback-a technique for improving bandwidth utilization , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[64] James E. Smith,et al. Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[65] Christoforos E. Kozyrakis,et al. Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[66] Hugh Mair,et al. Analog-DFE-based 16Gb/s SerDes in 40nm CMOS that operates across 34dB loss channels at Nyquist with a baud rate CDR and 1.2Vpp voltage-mode driver , 2011, 2011 IEEE International Solid-State Circuits Conference.
[67] Sandhya Dwarkadas,et al. Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[68] Thomas Krause,et al. A 12.5Gb/s SerDes in 65nm CMOS Using a Baud-Rate ADC with Digital Receiver Equalization and Clock Recovery , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[69] William J. Dally,et al. Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.
[70] Luiz André Barroso,et al. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.
[71] Trevor Mudge,et al. Scaling High-Performance Interconnect Architectures to Many-Core Systems , 2012 .
[72] M. Horowitz,et al. A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS , 2007, IEEE Journal of Solid-State Circuits.
[73] Babak Falsafi,et al. Scale-out NUMA , 2014, ASPLOS.
[74] Lizy Kurian John,et al. The virtual write queue: coordinating DRAM and last-level cache policies , 2010, ISCA.
[75] Li-Shiuan Peh,et al. Flow control and micro-architectural mechanisms for extending the performance of interconnection networks , 2001 .
[76] Ayal Shoval,et al. An 8.4mW/Gb/s 4-lane 48Gb/s multi-standard-compliant transceiver in 40nm digital CMOS technology , 2011, 2011 IEEE International Solid-State Circuits Conference.
[77] Vishnu Balan,et al. 26.1 A 130mW 20Gb/s half-duplex serial link in 28nm CMOS , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[78] George Michelogiannakis,et al. Elastic-buffer flow control for on-chip networks , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[79] Babak Falsafi,et al. Power Scaling: the Ultimate Obstacle to 1K-Core Chips , 2010 .
[80] Babak Falsafi,et al. Database Servers on Chip Multiprocessors: Limitations and Opportunities , 2007, CIDR.
[81] Babak Falsafi,et al. Scale-out processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[82] Michael Ferdman,et al. Architectural Support for Dynamic Linking , 2015, ASPLOS.
[83] Krishna P. Gummadi,et al. A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.
[84] Song Jiang,et al. Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.
[85] Mark D. Hill,et al. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[86] Kathryn S. McKinley,et al. Guided region prefetching: a cooperative hardware/software approach , 2003, ISCA '03.
[87] Eisse Mensink,et al. Low-Power, High-Speed Transceivers for Network-on-Chip Communication , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[88] R. Brama,et al. A multi standard 1.5 to 10Gb/s latch-based 3-tap DFE receiver with a SSC tolerant CDR for serial backplane communication , 2008, 2008 IEEE Symposium on VLSI Circuits.
[89] Mor Harchol-Balter,et al. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[90] David W. Nellans,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.
[91] Sriram Sankar,et al. Server Engineering Insights for Large-Scale Online Services , 2010, IEEE Micro.
[92] Thomas F. Wenisch,et al. PowerNap: eliminating server idle power , 2009, ASPLOS.
[93] Sharad Malik,et al. Power-driven design of router microarchitectures in on-chip networks , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[94] Jiangchuan Liu,et al. Statistics and Social Network of YouTube Videos , 2008, 2008 16th Interntional Workshop on Quality of Service.
[95] Herschel A. Ainspan,et al. A 78mW 11.1Gb/s 5-tap DFE receiver with digitally calibrated current-integrating summers in 65nm CMOS , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.
[96] Hong Liu,et al. Energy proportional datacenter networks , 2010, ISCA.
[97] Gang Lu,et al. Characterization of real workloads of web search engines , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[98] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[99] Jung Ho Ahn,et al. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[100] Onur Mutlu,et al. Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[101] Feifei Li,et al. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[102] Babak Falsafi,et al. TurboTag: Lookup filtering to reduce coherence directory power , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).
[103] Henry Hoffmann,et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.
[104] Joseph Kennedy,et al. 26.2 A 205mW 32Gb/s 3-Tap FFE/6-tap DFE bidirectional serial link in 22nm CMOS , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[105] Dam Sunwoo,et al. Balancing DRAM locality and parallelism in shared memory CMP systems , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[106] Benjamin C. Lee,et al. Disintegrated control for energy-efficient and heterogeneous memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[107] Babak Falsafi,et al. Accurate and complexity-effective spatial pattern prediction , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[108] Chris Fallin,et al. Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.
[109] Hui Ding,et al. TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.