A novel system architecture for web scale applications using lightweight CPUs and virtualized I/O

Large web-scale applications typically use a distributed platform, like clusters of commodity servers, to achieve scalable and low-cost processing. The Map-Reduce framework and its open-source implementation, Hadoop, is commonly used to program these applications. Since these applications scale well with an increased number of servers, the cluster size is an important parameter. Cluster size however is constrained by power consumption. In this paper we present a system that uses low-power CPUs to increase the cluster size in a fixed power budget. Using low-power CPUs leads to the situation where the majority of a server's power is now consumed by the I/O sub-system. To overcome this, we develop a virtualized I/O sub-system where multiple servers share I/O resources. An ASIC based high-bandwidth interconnect fabric, and FPGA based I/O cards implement this virtualized I/O. The resulting system is the first production quality implementation of cluster-in-a-box that uses low-power CPUs. The unique design demonstrates a way to build systems using low-power CPUs, allowing a much larger number of servers in a cluster in the same power envelope. To overcome software inefficiency and increase the utilization of virtualized disk bandwidth, optimizations necessary for the operating system are also discussed. We built hardware based on these ideas and experiments on this system show a 3X average improvement in performance-per-Watt-hour compared to a commodity cluster with the same power budget.

[1]  Thomas F. Wenisch,et al.  Does low-power design imply energy efficiency for data centers? , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[2]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[3]  Komal Shringare,et al.  Apache Hadoop Goes Realtime at Facebook , 2015 .

[4]  Alan L. Cox,et al.  The Hadoop distributed filesystem: Balancing portability and performance , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[5]  Anand Sivasubramaniam,et al.  Benefits and limitations of tapping into stored energy for datacenters , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[6]  Richard E. Brown,et al.  Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431 , 2008 .

[7]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[8]  Jignesh M. Patel,et al.  Wimpy node clusters: what about non-wimpy workloads? , 2010, DaMoN '10.

[9]  Miljenko Mikuc,et al.  Estimating the Impact of Interrupt Coalescing Delays on Steady State TCP Throughput , 2002 .

[10]  GhemawatSanjay,et al.  The Google file system , 2003 .

[11]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[12]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[13]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[14]  Trevor N. Mudge,et al.  Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008, 2008 International Symposium on Computer Architecture.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[17]  Mark Horowitz,et al.  Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis , 2010, ISCA.

[18]  Urs Hölzle,et al.  Brawny cores still beat wimpy cores, most of the time , 2010 .

[19]  Sriram Sankar,et al.  Server Engineering Insights for Large-Scale Online Services , 2010, IEEE Micro.

[20]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.