AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers
暂无分享,去创建一个
Christoforos E. Kozyrakis | David I. August | Tipp Moseley | Parthasarathy Ranganathan | Svilen Kanev | Heiner Litz | Hyoun Kyu Cho | Grant Ayers | Nayana Prasad Nagendra | Trivikram Krishnamurthy | C. Kozyrakis | Heiner Litz | D. I. August | Svilen Kanev | Tipp Moseley | Parthasarathy Ranganathan | Grant Ayers | N. P. Nagendra | Trivikram Krishnamurthy
[1] Christoforos E. Kozyrakis,et al. ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.
[2] Parthasarathy Ranganathan,et al. The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition , 2018, The Datacenter as a Computer.
[3] Muthu Dayalan,et al. MapReduce : Simplified Data Processing on Large Cluster , 2018 .
[4] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[5] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[6] Babak Falsafi,et al. Confluence: Unified instruction supply for scale-out servers , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[7] Cheng-Chieh Huang,et al. Boomerang: A Metadata-Free Architecture for Control Flow Delivery , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[8] Thomas F. Wenisch,et al. Temporal instruction fetch streaming , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[9] Boris Grot,et al. Blasting through the Front-End Bottleneck with Shotgun , 2018, ASPLOS.
[10] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[11] Todd C. Mowry,et al. Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[12] Guilherme Ottoni,et al. BOLT: A Practical Binary Optimizer for Data Centers and Beyond , 2018, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[13] T. K. Prakash,et al. Performance Characterization of SPEC CPU 2006 Benchmarks on Intel Core 2 Duo Processor , .
[14] Gu-Yeon Wei,et al. Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[15] David A. Patterson,et al. Technical perspective: the data center is the computer , 2008, CACM.
[16] Derek Bruening,et al. Efficient, transparent, and comprehensive runtime code manipulation , 2004 .
[17] Tipp Moseley,et al. AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[18] Robert Hundt,et al. Loop Recognition in C++/Java/Go/Scala , 2011 .
[19] Thomas F. Wenisch,et al. RDIP: Return-address-stack Directed Instruction Prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Glenn Reinman,et al. Fetch directed instruction prefetching , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[21] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[22] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[23] Ahmad Yasin,et al. A Top-Down method for performance analysis and counters architecture , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[24] Jeffrey Dean,et al. Transparent, low-overhead profiling on modern processors , 1998 .
[25] Babak Falsafi,et al. SHIFT: Shared history instruction fetch for lean-core server processors , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[26] Robert S. Cohn,et al. Hot cold optimization of large Windows/NT applications , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[27] Lori L. Pollock,et al. A Region-based Partial Inlining Algorithm for an ILP Optimizing Compiler , 2002, PDPTA.
[28] Daniel Sánchez,et al. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).
[29] Gang Ren,et al. Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers , 2010, IEEE Micro.
[30] Paul Havlak,et al. Nesting of reducible and irreducible loops , 1997, TOPL.
[31] AilamakiAnastasia,et al. Clearing the clouds , 2012 .
[32] Babak Falsafi,et al. Proactive instruction fetch , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[33] Stéphane Eranian. What can performance counters do for memory subsystem analysis? , 2008, MSPC '08.
[34] Martin Fleury,et al. Software-Controlled Instruction Prefetch Buffering for Low-End Processors , 2015, J. Circuits Syst. Comput..
[35] Chunjie Luo,et al. Characterizing data analysis workloads in data centers , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).
[36] Christoforos E. Kozyrakis,et al. Memory Hierarchy for Web Search , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).