TriKon: A hypervisor aware manycore processor

Virtualization is increasingly being deployed to run applications in a cloud computing environment. Sadly, there are overheads associated with hypervisors that can prohibitively reduce application performance. A major source of the overheads is the destructive interference between the application, OS, and hypervisor in the memory system. We characterize such overheads in this paper, and propose the design of a novel Triangle cache that can effectively mitigate destructive interference across these three classes of workloads. We subsequently, proceed to design the TriKon manycore processor that consists of a set of heterogeneous cores with caches of different sizes, and Triangle caches. To maximize the throughput of the system as a whole, we propose a dynamic scheduling algorithm for scheduling a class of system and CPU intensive applications on the set of heterogeneous cores. The area of the TriKon processor is within 2% of a baseline processor, and with such a system, we could achieve a performance gain of 12% for a suite of benchmarks. Within this suite, the system intensive benchmarks show a performance gain of 20% while the performance of the compute intensive ones remains unaffected. Also, by allocating extra area for cores with sophisticated cache designs, we further improved the performance of the system intensive benchmarks to 30%.

[1]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[2]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[3]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[4]  Seung Ryoul Maeng,et al.  Virtualizing performance asymmetric multi-core systems , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[5]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[6]  Tong Li,et al.  Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[7]  Kevin Skadron,et al.  Scaling with Design Constraints: Predicting the Future of Big Chips , 2011, IEEE Micro.

[8]  Ludmila Cherkasova,et al.  Measuring CPU Overhead for I/O Processing in the Xen Virtual Machine Monitor , 2005, USENIX ATC, General Track.

[9]  Donald Yeung,et al.  BioBench: A Benchmark Suite of Bioinformatics Applications , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[10]  Roy T. Fielding,et al.  The Apache HTTP Server Project , 1997, IEEE Internet Comput..

[11]  Yingwei Luo,et al.  A Simple Cache Partitioning Approach in a Virtualized Environment , 2009, 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[12]  Smruti R. Sarangi,et al.  ParTejas: A parallel simulator for multicore processors , 2014, ISPASS.

[13]  Alex Landau,et al.  Towards exitless and efficient paravirtual I/O , 2012, SYSTOR '12.

[14]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[15]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[16]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[18]  Manuel Prieto,et al.  Maximizing Power Efficiency with Asymmetric Multicore Systems , 2009, ACM Queue.

[19]  Muli Ben-Yehuda,et al.  SplitX: Split Guest/Hypervisor Execution on Multi-Core , 2011, WIOV.

[20]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[21]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[22]  Prathmesh Kallurkar,et al.  Architectural Support for Handling Jitterin Shared Memory Based Parallel Applications , 2014, IEEE Transactions on Parallel and Distributed Systems.

[23]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[24]  Alex Landau,et al.  ELI: bare-metal performance for I/O virtualization , 2012, ASPLOS XVII.