The sharing architecture: sub-core configurability for IaaS clouds

Businesses and Academics are increasingly turning to Infrastructure as a Service (IaaS) Clouds such as Amazon's Elastic Compute Cloud (EC2) to fulfill their computing needs. Unfortunately, current IaaS systems provide a severely restricted pallet of rentable computing options which do not optimally fit the workloads that they are executing. We address this challenge by proposing and evaluating a manycore architecture, called the Sharing Architecture, specifically optimized for IaaS systems by being reconfigurable on a sub-core basis. The Sharing Architecture enables better matching of workload to micro-architecture resources by replacing static cores with Virtual Cores which can be dynamically reconfigured to have different numbers of ALUs and amount of Cache. This reconfigurability enables many of the same benefits of heterogeneous multicores, but in a homogeneous fabric, and enables the reuse and resale of resources on a per ALU or per KB of cache basis. The Sharing Architecture leverages Distributed ILP techniques, but is designed in a way to be independent of recompilation. In addition, we introduce an economic model which is enabled by the Sharing Architecture and show how different users who have varying needs can be better served by such a flexible architecture. We evaluate the Sharing Architecture across a benchmark suite of Apache, SPECint, and parts of PARSEC, and find that it can achieve up to a 5x more economically efficient market when compared to static architecture multicores. We implemented the Sharing Architecture in Verilog and present area overhead results.

[1]  S. McFarling Combining Branch Predictors , 1993 .

[2]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[3]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[4]  Dean M. Tullsen,et al.  Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.

[5]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[6]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[7]  Rajeev Balasubramonian,et al.  Dynamically managing the communication-parallelism trade-off in future clustered processors , 2003, ISCA '03.

[8]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[9]  Anant Agarwal,et al.  Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[10]  Santithorn Bunchua,et al.  Fully Distributed Register Files for Heterogeneous Clustered Microarchitectures , 2004 .

[11]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[12]  Lizy Kurian John,et al.  Scaling to the end of silicon with EDGE architectures , 2004, Computer.

[13]  Kevin Skadron,et al.  Understanding the energy efficiency of simultaneous multithreading , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[14]  Norman P. Jouppi,et al.  Conjoined-Core Chip Multiprocessing , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[15]  Jason Cong,et al.  Understanding the energy efficiency of SMT and CMP with multiclustering , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[16]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[17]  Simha Sethumadhavan,et al.  Distributed Microarchitectural Protocols in the TRIPS Prototype Processor , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[18]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[19]  Doug Burger,et al.  Implementation and Evaluation of On-Chip Network Architectures , 2006, 2006 International Conference on Computer Design.

[20]  Reuven M. Lerner At the forge: amazon web services , 2006 .

[21]  Simha Sethumadhavan,et al.  Design and Implementation of the TRIPS Primary Memory System , 2006, 2006 International Conference on Computer Design.

[22]  S. Winkel Optimal versus Heuristic Global Code Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[23]  Mark D. Hill,et al.  Virtual hierarchies to support server consolidation , 2007, ISCA '07.

[24]  Simha Sethumadhavan,et al.  Late-binding: enabling unordered load-store queues , 2007, ISCA '07.

[25]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[26]  Engin Ipek,et al.  Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.

[27]  Madhu Saravana Sibi Govindan,et al.  Composable Lightweight Processors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[28]  Yan Solihin,et al.  A Framework for Providing Quality of Service in Chip Multi-Processors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[29]  Yan Solihin,et al.  QoS policies and architecture for cache/memory in CMP platforms , 2007, SIGMETRICS '07.

[30]  Matei Ripeanu,et al.  Amazon S3 for science grids: a viable solution? , 2008, DADC '08.

[31]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[32]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[33]  Kang G. Shin,et al.  Automated control of multiple virtualized resources , 2009, EuroSys '09.

[34]  Hui Wang,et al.  Multi-Tiered On-Demand Resource Scheduling for VM-Based Data Center , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[35]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[36]  Tejaswi Redkar,et al.  Windows Azure Platform , 2010 .

[37]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[38]  David A. Wood,et al.  WiDGET: Wisconsin decoupled grid execution tiles , 2010, ISCA.

[39]  Henry Hoffmann,et al.  Application heartbeats for software performance and health , 2010, PPoPP '10.

[40]  Henry Hoffmann,et al.  Enabling technologies for self-aware adaptive systems , 2010, 2010 NASA/ESA Conference on Adaptive Hardware and Systems.

[41]  Arkady Kanevsky,et al.  Enabling a marketplace of clouds: VMware's vCloud director , 2010, OPSR.

[42]  Bradford M. Beckmann,et al.  The gem5 simulator , 2011, CARN.

[43]  Adit Ranadive,et al.  ResourceExchange: Latency-Aware Scheduling in Virtualized Environments with High Performance Fabrics , 2011, 2011 IEEE International Conference on Cluster Computing.

[44]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[45]  Anne M. Holler,et al.  Cloud Scale Resource Management: Challenges and Techniques , 2011, HotCloud.

[46]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[47]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[48]  Una-May O'Reilly,et al.  Siblingrivalry: online autotuning through local competitions , 2012, CASES '12.

[49]  Johan Tordsson,et al.  An adaptive hybrid elasticity controller for cloud infrastructures , 2012, 2012 IEEE Network Operations and Management Symposium.

[50]  Yale N. Patt,et al.  MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[51]  Tipp Moseley,et al.  Measuring interference between live datacenter applications , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[52]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[53]  Benjamin C. Lee,et al.  Navigating heterogeneous processors with market mechanisms , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[54]  Lakshmi Sobhana Kalli,et al.  Market-Oriented Cloud Computing : Vision , Hype , and Reality for Delivering IT Services as Computing , 2013 .

[55]  Henry Hoffmann,et al.  A generalized software framework for accurate and efficient management of performance goals , 2013, 2013 Proceedings of the International Conference on Embedded Software (EMSOFT).

[56]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[57]  Matthew Farrens,et al.  Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture , 2013, MICRO 2013.