Memory Hierarchy for Web Search
暂无分享,去创建一个
Christoforos E. Kozyrakis | Jung Ho Ahn | Parthasarathy Ranganathan | Grant Ayers | C. Kozyrakis | Christos Kozyrakis | Parthasarathy Ranganathan | Grant Ayers
[1] John Paul Shen,et al. Hardware Support for Prescient Instruction Prefetch , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[2] Cheng-Chieh Huang,et al. ATCache: Reducing DRAM cache latency via a small SRAM tag cache , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[3] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[4] Babak Falsafi,et al. Fat Caches for Scale-Out Servers , 2017, IEEE Micro.
[5] Gang Ren,et al. Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers , 2010, IEEE Micro.
[6] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[7] Kushagra Vaid,et al. Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.
[8] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[9] Jeffrey Dean,et al. Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.
[10] Todd C. Mowry,et al. Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[11] Christoforos E. Kozyrakis,et al. Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[12] Gabriel H. Loh,et al. A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[13] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[14] Babak Falsafi,et al. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.
[15] Babak Falsafi,et al. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[16] Pradeep Dubey,et al. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..
[17] Sriram Sankar,et al. Server Engineering Insights for Large-Scale Online Services , 2010, IEEE Micro.
[18] Babak Falsafi,et al. Scale-out NUMA , 2014, ASPLOS.
[19] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[20] William J. Bowhill,et al. The Xeon® Processor E5-2600 v3: a 22 nm 18-Core Product Family , 2016, IEEE Journal of Solid-State Circuits.
[21] Yen-Chen Liu,et al. Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.
[22] Kevin Zhang,et al. 2nd generation embedded DRAM with 4X lower self refresh power in 22nm Tri-Gate CMOS technology , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.
[23] Babak Falsafi,et al. Scale-out processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[24] Babak Falsafi,et al. Proactive instruction fetch , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[25] Jung Ho Ahn,et al. A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.
[26] Luiz André Barroso,et al. Memory system characterization of commercial workloads , 1998, ISCA.
[27] Mark D. Hill,et al. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[28] Luiz André Barroso,et al. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.
[29] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[30] Ahmad Yasin,et al. A Top-Down method for performance analysis and counters architecture , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[31] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[32] Allison Woodruff,et al. An Investigation of Documents from the World Wide Web , 1996, Comput. Networks.
[33] Hari Angepat,et al. A cloud-scale acceleration architecture , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[34] Wilfred Gomes,et al. 5.9 Haswell: A family of IA 22nm processors , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[35] Christoforos E. Kozyrakis,et al. Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[36] Babak Falsafi,et al. SHIFT: Shared history instruction fetch for lean-core server processors , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[37] Francisco J. Cazorla,et al. Making data prefetch smarter: Adaptive prefetching on POWER7 , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[38] Carole-Jean Wu,et al. PACMan: Prefetch-Aware Cache Management for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[39] Josep Torrellas,et al. The memory performance of DSS commercial workloads in shared-memory multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[40] Onur Mutlu,et al. Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management , 2012, IEEE Computer Architecture Letters.
[41] Gabriel H. Loh,et al. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[42] Bruce Jacob,et al. Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[43] Christoforos E. Kozyrakis,et al. Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[44] Trevor N. Mudge,et al. Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008, 2008 International Symposium on Computer Architecture.
[45] Luiz André Barroso,et al. Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.
[46] Thomas F. Wenisch,et al. RDIP: Return-address-stack Directed Instruction Prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[47] Stéphan Jourdan,et al. Haswell: The Fourth-Generation Intel Core Processor , 2014, IEEE Micro.
[48] David Daly,et al. The cache and memory subsystems of the IBM POWER8 processor , 2015, IBM J. Res. Dev..
[49] Thomas F. Wenisch,et al. Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[50] Gu-Yeon Wei,et al. Profiling a Warehouse-Scale Computer , 2016, IEEE Micro.
[51] Ronald G. Dreslinski,et al. Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.