Energy-efficient Data-intensive Computing with a Fast Array of Wimpy Nodes

Abstract : Large-scale data-intensive computing systems have become a critical foundation for Internet-scale services. Their widespread growth during the past decade has raised datacenter energy demand and created an increasingly large financial burden and scaling challenge: Peak energy requirements today are a significant cost of provisioning and operating datacenters. In this thesis we propose to reduce the peak energy consumption of datacenters by using a FAWN: A Fast Array of Wimpy Nodes. FAWN is an approach to building datacenter server clusters using low-cost, low-power servers that are individually optimized for energy efficiency rather than raw performance alone. FAWN systems, however, have a different set of resource constraints than traditional systems that can prevent existing software from reaping the improved energy efficiency benefits FAWN systems can provide. This dissertation describes the principles behind FAWN and the software techniques necessary to unlock its energy efficiency potential. First, we present a deep study into building FAWN-KV, a distributed, log-structured key-value storage system designed for an early FAWN prototype. Second, we present a broader classification and workload analysis showing when FAWN can be more energy-efficient and under what workload conditions a FAWN cluster would perform poorly in comparison to a smaller number of high-speed systems. Last, we describe modern trends that portend a narrowing gap between CPU and I/O capability and highlight the challenges endemic to all future balanced systems. Using FAWN as an early example, we demonstrate that pervasive use of vector interfaces throughout distributed storage systems can improve throughput by an order of magnitude and eliminate the redundant work found in many data-intensive workloads.

[1]  George L.-T. Chiu,et al.  Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..

[2]  Anastasia Ailamaki,et al.  StagedDB: Designing Database Servers for Modern Hardware , 2005, IEEE Data Eng. Bull..

[3]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[4]  Manish Marwah,et al.  Delivering Energy Proportionality with Non Energy-Proportional Systems - Optimizing the Ensemble , 2008, HotPower.

[5]  Gregory R. Ganger,et al.  Filling the Memory Access Gap: A Case for On-Chip Magnetic Storage (CMU-CS-99-174) , 1999 .

[6]  Babak Falsafi,et al.  To Share or Not To Share? , 2007, VLDB.

[7]  Rajesh K. Gupta,et al.  Onyx: A Prototype Phase Change Memory Storage Array , 2011, HotStorage.

[8]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[9]  Ben Vandiver On the Use of NAND Flash Memory in High-Performance Relational Databases , 2007 .

[10]  Kai Li,et al.  Storage alternatives for mobile computers , 1994, OSDI '94.

[11]  David Steere Exploiting the non-determinism and asynchrony of set iterators to reduce aggregate file I/O latency , 1997, SOSP 1997.

[12]  R.H. Katz,et al.  Tech Titans Building Boom , 2009, IEEE Spectrum.

[13]  David G. Andersen,et al.  Energy-efficient cluster computing with FAWN: workloads and implications , 2010, e-Energy.

[14]  David Brumley,et al.  SplitScreen: Enabling efficient, distributed malware detection , 2010, Journal of Communications and Networks.

[15]  Lakshmi Ganesh,et al.  Optimizing Power Consumption in Large Scale Storage Systems , 2007, HotOS.

[16]  Eitan Frachtenberg,et al.  Many-core key-value store , 2011, 2011 International Green Computing Conference and Workshops.

[17]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[18]  Peter Druschel,et al.  Soft timers: efficient microsecond software timer support for network processing , 1999, SOSP.

[19]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[20]  Michael Neufeld,et al.  ELF: an efficient log-structured flash file system for micro sensor nodes , 2004, SenSys '04.

[21]  Erez Zadok,et al.  Cosy: Develop in User-Land, Run in Kernel-Mode , 2003, HotOS.

[22]  Joseph G. Slember,et al.  GPFS Scans 10 Billion Files in 43 Minutes , 2011 .

[23]  Randal E. Bryant,et al.  Data-Intensive Supercomputing: The case for DISC , 2007 .

[24]  Eddie Kohler,et al.  Events Can Make Sense , 2007, USENIX Annual Technical Conference.

[25]  Mark Silberstein,et al.  PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.

[26]  Rajesh K. Gupta,et al.  Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[27]  Karsten Schwan,et al.  Robust and flexible power-proportional storage , 2010, SoCC '10.

[28]  AppavooJonathan,et al.  Project Kittyhawk: building a global-scale computer , 2008 .

[29]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[30]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[31]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[32]  Ethan L. Miller,et al.  Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage , 2008, FAST.

[33]  Stephen Berard,et al.  Implications of Historical Trends in the Electrical Efficiency of Computing , 2011, IEEE Annals of the History of Computing.

[34]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[35]  Neal Cardwell,et al.  Evaluation of Existing Architectures in IRAM Systems , 1998 .

[36]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[37]  Peter M. Kogge,et al.  Combined DRAM and logic chip for massively parallel systems , 1995, Proceedings Sixteenth Conference on Advanced Research in VLSI.

[38]  Randy H. Katz,et al.  An energy case for hybrid datacenters , 2010, OPSR.

[39]  Goetz Graefe,et al.  Query processing techniques for solid state drives , 2009, SIGMOD Conference.

[40]  Michael Stumm,et al.  FlexSC: Flexible System Call Scheduling with Exception-Less System Calls , 2010, OSDI.

[41]  A. Szalay,et al.  Low Power Amdahl Blades for Data Intensive Computing , 2009 .

[42]  David E. Irwin,et al.  Ensemble-level Power Management for Dense Blade Servers , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[43]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[44]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[45]  Suman Nath,et al.  FlashDB: Dynamic Self-tuning Database for NAND Flash , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[46]  David G. Andersen,et al.  Exact pattern matching with feed-forward bloom filters , 2011, JEAL.

[47]  Jae-Myung Kim,et al.  A case for flash memory ssd in enterprise database applications , 2008, SIGMOD Conference.

[48]  James R. Larus,et al.  Using Cohort-Scheduling to Enhance Server Performance , 2002, USENIX Annual Technical Conference, General Track.

[49]  Trevor N. Mudge,et al.  Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008, 2008 International Symposium on Computer Architecture.

[50]  Peter Desnoyers,et al.  Capsule: an energy-optimized object storage system for memory-constrained sensor devices , 2006, SenSys '06.

[51]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[52]  David G. Andersen,et al.  Exact pattern matching with feed-forward bloom filters , 2012, JEAL.

[53]  Milo Polte,et al.  Enabling Enterprise Solid State Disks Performance , 2009 .

[54]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[55]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[56]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[57]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[58]  Vanish Talwar,et al.  Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems , 2008, IEEE Micro.

[59]  David J. Lilja,et al.  High performance solid state storage under Linux , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[60]  Eric Anderson,et al.  Efficiency matters! , 2010, OPSR.

[61]  Azriel Rosenfeld,et al.  Image Processing on ZMOB , 1982, IEEE Transactions on Computers.

[62]  Pat Hanrahan,et al.  GRAMPS: A programming model for graphics pipelines , 2009, ACM Trans. Graph..

[63]  David E. Culler,et al.  Scalable, distributed data structures for internet service construction , 2000, OSDI.

[64]  Hakim Weatherspoon,et al.  Operating Systems Abstractions for Software Packet Processing in Datacenters , 2011 .

[65]  Urs Hölzle,et al.  Brawny cores still beat wimpy cores, most of the time , 2010 .

[66]  John Davis,et al.  Building Energy-Efficient Systems for Sequential I/O Workloads , 2010 .

[67]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[68]  Michael Garland,et al.  Understanding throughput-oriented architectures , 2010, Commun. ACM.

[69]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[70]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[71]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[72]  Yuanyuan Zhou,et al.  Hibernator: helping disk arrays sleep through the winter , 2005, SOSP '05.

[73]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[74]  Mahmut T. Kandemir,et al.  Multicollective I/O: A technique for exploiting inter-file access patterns , 2006, TOS.

[75]  Jignesh M. Patel,et al.  Wimpy node clusters: what about non-wimpy workloads? , 2010, DaMoN '10.

[76]  Peter Sanders,et al.  Energy-efficient sorting using solid state disks , 2011, Sustain. Comput. Informatics Syst..

[77]  Suman Nath,et al.  Cheap and Large CAMs for High Performance Data-Intensive Networked Systems , 2010, NSDI.

[78]  Jin Li,et al.  ChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory , 2010, USENIX Annual Technical Conference.

[79]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[80]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2012, TNET.

[81]  Gernot Heiser,et al.  Dynamic voltage and frequency scaling: the laws of diminishing returns , 2010 .

[82]  Christoforos E. Kozyrakis,et al.  JouleSort: a balanced energy-efficiency benchmark , 2007, SIGMOD '07.

[83]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[84]  Alec Wolman,et al.  Stout: An Adaptive Interface to Scalable Cloud Storage , 2010, USENIX Annual Technical Conference.

[85]  Jin Li,et al.  FlashStore , 2010, Proc. VLDB Endow..

[86]  Christos Faloutsos,et al.  Active Disks for Large-Scale Data Processing , 2001, Computer.

[87]  Michael Wu,et al.  eNVy: a non-volatile, main memory storage system , 1994, ASPLOS VI.

[88]  Suman Nath,et al.  FlashDB: Dynamic Self-tuning Database for NAND Flash , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[89]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[90]  Dimitrios Gunopulos,et al.  Microhash: an efficient index structure for fash-based sensor devices , 2005, FAST'05.

[91]  Anant Agarwal,et al.  An operating system for multicore and clouds: mechanisms and implementation , 2010, SoCC '10.

[92]  Matti A. Hiltunen,et al.  Cassyopia: Compiler Assisted System Optimization , 2003, HotOS.

[93]  Mark Hempstead,et al.  The Case for Power-Agile Computing , 2011, HotOS.