Memory region: a system abstraction for managing the complex memory structures of multicore platforms

[1]  Qingyuan Deng,et al.  MemScale: active low-power modes for main memory , 2011, ASPLOS XVI.

[2]  Jeff Bonwick,et al.  The Slab Allocator: An Object-Caching Kernel Memory Allocator , 1994, USENIX Summer.

[3]  Andreas Moshovos RegionScout: exploiting coarse grain sharing in snoop-based coherence , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[4]  Srinivas Devadas,et al.  Deadlock-free fine-grained thread migration , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[5]  Navjot Singh,et al.  Supporting soft real-time tasks in the xen hypervisor , 2010, VEE '10.

[6]  Karsten Schwan,et al.  Symbiotic Scheduling for Shared Caches in Multi-core Systems Using Memory Footprint Signature , 2011, 2011 International Conference on Parallel Processing.

[7]  Hsien-Hsin S. Lee,et al.  An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[8]  Peter J. Denning BEFORE MEMORY WAS VIRTUAL , 1997 .

[9]  Karsten Schwan,et al.  vNUMA-mgr: Managing VM memory on NUMA platforms , 2010, 2010 International Conference on High Performance Computing.

[10]  Michael Stumm,et al.  Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system , 1999, OSDI '99.

[11]  Sangyeun Cho,et al.  Better than the Two : Exceeding Private and Shared Caches via Two-Dimensional Page Coloring , 2007 .

[12]  Muli Ben-Yehuda,et al.  The Resource-as-a-Service (RaaS) Cloud , 2012, HotCloud.

[13]  Pradeep Dubey,et al.  Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.

[14]  John Paul Shen,et al.  Best of both latency and throughput , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[15]  Michael Stumm,et al.  Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[16]  Ali Kamali,et al.  AASH: an asymmetry-aware scheduler for hypervisors , 2010, VEE '10.

[17]  Michael Stumm,et al.  Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.

[18]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[19]  Yan Solihin,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[20]  Mor Harchol-Balter,et al.  SOFTScale: Stealing Opportunistically for Transient Scaling , 2012, Middleware.

[21]  David K. Tam,et al.  Managing Shared L2 Caches on Multicore Systems in Software , 2007 .

[22]  Yuan Xie,et al.  Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[24]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[25]  Margo I. Seltzer,et al.  Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design , 2005, USENIX Annual Technical Conference, General Track.

[26]  Doug Burger,et al.  NUCA : A Non-Uniform Cache Access Architecture for Wire-Delay Dominated On-Chip Caches , 2003 .

[27]  Manuel Prieto,et al.  A comprehensive scheduler for asymmetric multicore systems , 2010, EuroSys '10.

[28]  Gabriel H. Loh,et al.  Challenges in Heterogeneous Die-Stacked and Off-Chip Memory Systems , 2012 .

[29]  Yan Solihin,et al.  An analytical model for cache replacement policy performance , 2006, SIGMETRICS '06/Performance '06.

[30]  Jaehyuk Huh,et al.  A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[31]  Ron Brightwell,et al.  Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, HiPC 2008.

[32]  Gabriel H. Loh,et al.  Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[33]  Mithuna Thottethodi,et al.  Dynamic server provisioning to minimize cost in an IaaS cloud , 2011, SIGMETRICS.

[34]  Ricardo Bianchini,et al.  Page placement in hybrid memory systems , 2011, ICS '11.

[35]  Eyal de Lara,et al.  Kaleidoscope: cloud micro-elasticity via VM state coloring , 2011, EuroSys '11.

[36]  Kai Shen,et al.  Virtual Machine Memory Access Tracing with Hypervisor Exclusive Cache , 2007, USENIX Annual Technical Conference.

[37]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[38]  Ripal Nathuji,et al.  Exploiting Platform Heterogeneity for Power Efficient Data Centers , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[39]  Benjamin C. Lee,et al.  Navigating heterogeneous processors with market mechanisms , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[40]  Thomas F. Wenisch,et al.  CoScale: Coordinating CPU and Memory System DVFS in Server Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[41]  Lixin Zhang,et al.  Adaptive mechanisms and policies for managing cache hierarchies in chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[42]  Daniel Wong,et al.  KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[43]  Karsten Schwan,et al.  Software-controlled transparent management of heterogeneous memory resources in virtualized systems , 2013, MSPC '13.

[44]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[45]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[46]  Venkatesh Pallipadi,et al.  The Ondemand Governor Past, Present, and Future , 2010 .

[47]  Karsten Schwan,et al.  Exploiting Latent I/O Asynchrony in Petascale Science Applications , 2011, Int. J. High Perform. Comput. Appl..

[48]  Karsten Schwan,et al.  HeteroMates: Providing high dynamic power range on client devices using heterogeneous core groups , 2012, 2012 International Green Computing Conference (IGCC).

[49]  Andrea C. Arpaci-Dusseau,et al.  Geiger: monitoring the buffer cache in a virtual machine environment , 2006, ASPLOS XII.

[50]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[51]  Baochun Li,et al.  Revenue maximization with dynamic auctions in IaaS cloud markets , 2013, 2013 IEEE/ACM 21st International Symposium on Quality of Service (IWQoS).

[52]  Karsten Schwan,et al.  VirtualPower: coordinated power management in virtualized enterprise systems , 2007, SOSP.

[53]  Antti Ylä-Jääski,et al.  Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2 , 2012, HotCloud.

[54]  Zhenlin Wang,et al.  Dynamic memory balancing for virtual machines , 2009, OPSR.

[55]  Yan Solihin,et al.  CHOP: Adaptive filter-based DRAM caching for CMP server platforms , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[56]  Vishakha Gupta,et al.  Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems , 2011, OPSR.

[57]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[58]  Calton Pu,et al.  Challenges and Opportunities in Consolidation at High Resource Utilization: Non-monotonic Response Time Variations in n-Tier Applications , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[59]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[60]  Yale N. Patt,et al.  Utility-Based Cache Partitioning , 2006 .

[61]  Ada Gavrilovska,et al.  Cooperative Application-level Processing on Hosts and their Attached Network Processors , .

[62]  Jignesh M. Patel,et al.  Wimpy node clusters: what about non-wimpy workloads? , 2010, DaMoN '10.

[63]  Urs Hölzle,et al.  Brawny cores still beat wimpy cores, most of the time , 2010 .

[64]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[65]  Kang G. Shin,et al.  Adaptive control of virtualized resources in utility computing environments , 2007, EuroSys '07.

[66]  Sangyeun Cho,et al.  Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[67]  Vivien Quéma,et al.  Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.

[68]  Mark S. Squillante,et al.  Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..

[69]  Karsten Schwan,et al.  Region scheduling: efficiently using the cache architectures via page-level affinity , 2012, ASPLOS XVII.

[70]  Karsten Schwan,et al.  DataStager: scalable data staging services for petascale applications , 2009, HPDC '09.

[71]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[72]  Hyun-Wook Jin,et al.  On the core affinity and file upload performance of Hadoop , 2013, DISCS-2013.

[73]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[74]  Michael M. Swift,et al.  Chameleon: operating system support for dynamic processors , 2012, ASPLOS XVII.

[75]  Krste Asanovic,et al.  Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[76]  Peter J. Denning,et al.  Thrashing: its causes and prevention , 1968, AFIPS Fall Joint Computing Conference.

[77]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[78]  Thomas F. Wenisch,et al.  System-level implications of disaggregated memory , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[79]  Guilherme Galante,et al.  A Survey on Cloud Computing Elasticity , 2012, 2012 IEEE Fifth International Conference on Utility and Cloud Computing.

[80]  Rajiv Gupta,et al.  Region Scheduling: An Approach for Detecting and Redistributing Parallelism , 1990, IEEE Trans. Software Eng..

[81]  Omer Khan,et al.  System-level Optimizations for Memory Access in the Execution Migration Machine ( EM 2 ) , 2011 .

[82]  Moinuddin K. Qureshi Adaptive Spill-Receive for robust high-performance caching in CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[83]  Tong Li,et al.  Operating system support for overlapping-ISA heterogeneous multi-core architectures , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[84]  Manuel Prieto,et al.  Maximizing Power Efficiency with Asymmetric Multicore Systems , 2009, ACM Queue.

[85]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.