Design and analysis of spatially-partitioned shared caches
暂无分享,去创建一个
[1] Anant Agarwal,et al. Core Count vs Cache Size for Manycore Architectures in the Cloud , 2010 .
[2] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[3] David H. Albonesi,et al. Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[4] Yan Solihin,et al. A Framework for Providing Quality of Service in Chip Multi-Processors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[5] 吉田 利雄. SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC , 2014 .
[6] David A. Wood,et al. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.
[7] Mark D. Hill,et al. Virtual hierarchies to support server consolidation , 2007, ISCA '07.
[8] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[9] Avraham A. Melkman,et al. On-Line Construction of the Convex Hull of a Simple Polyline , 1987, Inf. Process. Lett..
[10] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[11] Efraim Rotem,et al. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.
[12] Donald Yeung,et al. Studying multicore processor scaling via reuse distance analysis , 2013, ISCA.
[13] Thomas R. Gross,et al. Memory system performance in a NUMA multicore multiprocessor , 2011, SYSTOR '11.
[14] David A. Wood,et al. IPC Considered Harmful for Multiprocessor Workloads , 2006, IEEE Micro.
[15] Donald E. Knuth,et al. Axioms and Hulls , 1992, Lecture Notes in Computer Science.
[16] Hsien-Hsin S. Lee,et al. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[17] Onur Mutlu,et al. Asymmetry-aware execution placement on manycore chips , 2013 .
[18] Christoforos E. Kozyrakis,et al. ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.
[19] Lei Jiang,et al. Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[20] Christopher R. Johnson,et al. PIKA: A Network Service for Multikernel Operating Systems , 2014 .
[21] Karthikeyan Sankaralingam,et al. A general constraint-centric scheduling framework for spatial architectures , 2013, PLDI.
[22] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[23] Babak Falsafi,et al. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.
[24] Valentin Puente,et al. ESP-NUCA: A low-cost adaptive Non-Uniform Cache Architecture , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[25] Dean M. Tullsen,et al. Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.
[26] Mark Horowitz,et al. An analytical cache model , 1989, TOCS.
[27] Hyunjin Lee,et al. CloudCache: Expanding and shrinking private caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[28] M TullsenDean,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .
[29] Christoforos E. Kozyrakis,et al. Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[30] Sanjeev Kumar,et al. Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.
[31] Mateo Valero,et al. Improving Cache Management Policies Using Dynamic Reuse Distances , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[32] Armand M. Makowski,et al. Optimal replacement policies for nonuniform cache objects with optional eviction , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).
[33] Peter J. Denning,et al. Thrashing: its causes and prevention , 1968, AFIPS Fall Joint Computing Conference.
[34] Yutao Zhong,et al. Predicting whole-program locality through reuse distance analysis , 2003, PLDI.
[35] Reetuparna Das,et al. Application-to-core mapping policies to reduce memory system interference in multi-core systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[36] Kei Hiraki,et al. Inter-reference gap distribution replacement: an improved replacement algorithm for set-associative caches , 2004, ICS '04.
[37] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[38] Easwaran Raman,et al. MAO — An extensible micro-architectural optimizer , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[39] Nathan Beckmann,et al. A Cache Model for Modern Processors , 2015 .
[40] Norman P. Jouppi,et al. Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[41] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[42] Larry Carter,et al. Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.
[43] Lizhong Chen,et al. Futility Scaling: High-Associativity Cache Partitioning , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[44] Jean Roman,et al. SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs , 1996, HPCN Europe.
[45] Saman P. Amarasinghe,et al. Maps: a compiler-managed memory system for Raw machines , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[46] Jaehyuk Huh,et al. A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.
[47] Benoît Dupont de Dinechin,et al. A clustered manycore processor architecture for embedded and accelerated applications , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).
[48] Aamer Jaleel,et al. CRUISE: cache replacement and utility-aware scheduling , 2012, ASPLOS XVII.
[49] Nathan Beckmann,et al. Jigsaw: Scalable software-defined caches , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[50] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.
[51] Alfred V. Aho,et al. Principles of Optimal Page Replacement , 1971, J. ACM.
[52] Yuan Xie,et al. Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[53] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[54] Yan Solihin,et al. QoS policies and architecture for cache/memory in CMP platforms , 2007, SIGMETRICS '07.
[55] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[56] Li-Shiuan Peh,et al. CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs , 2011, J. Parallel Distributed Comput..
[57] William J. Dally,et al. Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures , 2010, SPAA '10.
[58] Daniel Sánchez,et al. Ubik: efficient cache sharing with strict qos for latency-critical workloads , 2014, ASPLOS.
[59] Anant Agarwal,et al. An operating system for multicore and clouds: mechanisms and implementation , 2010, SoCC '10.
[60] Mikko H. Lipasti,et al. Tag tables , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[61] M. Javidi,et al. Iterative methods to nonlinear equations , 2007, Appl. Math. Comput..
[62] K. Steinhubl. Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .
[63] Per Stenström,et al. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[64] Moinuddin K. Qureshi. Adaptive Spill-Receive for robust high-performance caching in CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[65] José González,et al. Distributed Cooperative Caching , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[66] André Seznec,et al. A case for two-way skewed-associative caches , 1993, ISCA '93.
[67] William J. Dally,et al. SLIP: Reducing wire energy in the memory hierarchy , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[68] David A. Patterson,et al. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness , 2013, ISCA.
[69] Christoforos E. Kozyrakis,et al. Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[70] Christoforos E. Kozyrakis,et al. The ZCache: Decoupling Ways and Associativity , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[71] Aamer Jaleel,et al. Last level cache (LLC) performance of data mining workloads on a CMP - a case study of parallel bioinformatics workloads , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[72] Charles M. Grinstead,et al. Introduction to probability , 1999, Statistics for the Behavioural Sciences.
[73] Aamer Jaleel,et al. The gradient-based cache partitioning algorithm , 2012, TACO.
[74] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[75] Xiao Zhang,et al. CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.
[76] Onur Mutlu,et al. Exploiting compressed block size as an indicator of future reuse , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[77] Richard B. Brown,et al. Congestion driven quadratic placement , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).
[78] Jack Doweck,et al. Inside Intel® Core microarchitecture , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).
[79] Aamer Jaleel,et al. Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[80] Robert E. Tarjan,et al. Amortized efficiency of list update and paging rules , 1985, CACM.
[81] Aamer Jaleel,et al. Adaptive insertion policies for high performance caching , 2007, ISCA '07.
[82] Guy E. Blelloch,et al. Brief announcement: the problem based benchmark suite , 2012, SPAA '12.
[83] Mahmut T. Kandemir,et al. A case for integrated processor-cache partitioning in chip multiprocessors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[84] George Kurian,et al. ATAC: Improving performance and programmability with on-chip optical networks , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.
[85] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[86] Srinivas Devadas,et al. Dynamic Cache Partitioning for CMP/SMT Systems , 2004 .
[87] George Kurian,et al. Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[88] Mark D. Hill,et al. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[89] Gabriel H. Loh,et al. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.
[90] Srinivas Devadas,et al. Application-specific memory management for embedded systems using software-controlled caches , 2000, Proceedings 37th Design Automation Conference.
[91] Yan Solihin,et al. CHOP: Adaptive filter-based DRAM caching for CMP server platforms , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[92] Gabriel H. Loh,et al. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[93] Randolph Kirchain,et al. A roadmap for nanophotonics , 2007 .
[94] Sarah Bird. PACORA : Performance Aware Convex Optimization for Resource Allocation , 2011 .
[95] Koen De Bosschere,et al. XOR-based hash functions , 2005, IEEE Transactions on Computers.
[96] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.
[97] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[98] Bharadwaj S. Amrutur,et al. Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[99] Lingjia Tang,et al. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.
[100] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[101] I. Newton. Philosophiæ naturalis principia mathematica , 1973 .
[102] Anoop Gupta,et al. Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.
[103] Emery D. Berger,et al. STABILIZER: statistically sound performance evaluation , 2013, ASPLOS '13.
[104] Matthias Hauswirth,et al. Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.
[105] Stefanos Kaxiras,et al. Where replacement algorithms fail: a thorough analysis , 2010, CF '10.
[106] Amir Roth,et al. FIESTA: A Sample-Balanced Multi-Program Workload Methodology , 2009 .
[107] Anant Agarwal,et al. The case for elastic operating system services in fos , 2012, DAC Design Automation Conference 2012.
[108] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[109] George Kurian,et al. Efficient Cache Coherence on Manycore Optical Networks , 2010 .
[110] Christoforos E. Kozyrakis,et al. Comparing memory systems for chip multiprocessors , 2007, ISCA '07.
[111] Christopher Mozak,et al. Westmere: A family of 32nm IA processors , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[112] Alexandra Fedorova,et al. A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[113] George E. Monahan,et al. A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .
[114] Parag Agrawal,et al. The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.
[115] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[116] Anant Agarwal,et al. A Unified Operating System for Clouds and Manycore: fos , 2009 .
[117] D. F. Wong,et al. Simulated Annealing for VLSI Design , 1988 .
[118] David A. Wood,et al. Reuse-based online models for caches , 2013, SIGMETRICS '13.
[119] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[120] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[121] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[122] Francisco J. Cazorla,et al. FlexDCP: a QoS framework for CMP architectures , 2009, OPSR.
[123] Daniel Sánchez,et al. Talus: A simple way to remove cliffs in cache performance , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[124] Varghese George,et al. Power management of the third generation intel core micro architecture formerly codenamed ivy bridge , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).
[125] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[126] P. Spreij. Probability and Measure , 1996 .
[127] M. Martonosi,et al. A Comparison of Capacity Management Schemes for Shared CMP Caches , 2008 .
[128] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[129] Jichuan Chang,et al. Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[130] Zhe Wang,et al. Decoupled dynamic cache segmentation , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[131] Stijn Eyerman,et al. System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.
[132] David A. Wood,et al. ASR: Adaptive Selective Replication for CMP Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[133] Dan Page,et al. Partitioned Cache Architecture as a Side-Channel Defence Mechanism , 2005, IACR Cryptology ePrint Archive.
[134] Carole-Jean Wu,et al. SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[135] Mahmut T. Kandemir,et al. SHARP control: Controlled shared cache management in chip multiprocessors , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[136] Michael Stumm,et al. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.
[137] José González,et al. Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors , 2010, ISCA.
[138] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[139] Daniel A. Jiménez. Insertion and promotion for tree-based PseudoLRU last-level caches , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[140] R. Govindarajan,et al. Probabilistic Shared Cache Management (PriSM) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[141] Michael Stumm,et al. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.
[142] Babak Falsafi,et al. Database Servers on Chip Multiprocessors: Limitations and Opportunities , 2007, CIDR.
[143] David K. Tam,et al. Managing Shared L2 Caches on Multicore Systems in Software , 2007 .
[144] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[145] Hans J. Herrmann,et al. Geometrical cluster growth models and kinetic gelation , 1986 .
[146] Stefanos Kaxiras,et al. Cache replacement based on reuse-distance prediction , 2007, 2007 25th International Conference on Computer Design.
[147] Belliappa Kuttanna,et al. A Sub-1W to 2W Low-Power IA Processor for Mobile Internet Devices and Ultra-Mobile PCs in 45nm Hi-Κ Metal Gate CMOS , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.
[148] Jaejin Lee,et al. Using prime numbers for cache indexing to eliminate conflict misses , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[149] Zhao Zhang,et al. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[150] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[151] Vijay S. Pai,et al. Imbalanced cache partitioning for balanced data-parallel programs , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[152] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[153] Thomas F. Wenisch,et al. Unlocking bandwidth for GPUs in CC-NUMA systems , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[154] Daniel Sánchez,et al. Scaling distributed cache hierarchies through computation and data co-scheduling , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[155] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.