Flat Datacenter Storage

Flat Datacenter Storage (FDS) is a high-performance, fault-tolerant, large-scale, locality-oblivious blob store. Using a novel combination of full bisection bandwidth networks, data and metadata striping, and flow control, FDS multiplexes an application's large-scale I/O across the available throughput and latency budget of every disk in a cluster. FDS therefore makes many optimizations around data locality unnecessary. Disks also communicate with each other at their full bandwidth, making recovery from disk failures extremely fast. FDS is designed for datacenter scale, fully distributing metadata operations that might otherwise become a bottleneck. FDS applications achieve single-process read and write performance of more than 2GB/s. We measure recovery of 92GB data lost to disk failure in 6.2 s and recovery from a total machine failure with 655GB of data in 33.7 s. Application performance is also high: we describe our FDS-based sort application which set the 2012 world record for disk-to-disk sorting.

[1]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[2]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[3]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[4]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[5]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[6]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[7]  GhemawatSanjay,et al.  The Google file system , 2003 .

[8]  Michael Stonebraker,et al.  A measure of transaction processing power , 1985 .

[9]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.

[10]  Albert G. Greenberg,et al.  Towards a next generation data center architecture: scalability and commoditization , 2008, PRESTO '08.

[11]  Jon Howell,et al.  MinuteSort with Flat Datacenter Storage , 2012 .

[12]  Amin Vahdat,et al.  TritonSort: A Balanced Large-Scale Sorting System , 2011, NSDI.

[13]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[14]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[15]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[16]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[17]  Remzi H. Arpaci-Dusseau,et al.  Run-time adaptation in river , 2003, TOCS.

[18]  David R. Karger,et al.  Web Caching with Consistent Hashing , 1999, Comput. Networks.

[19]  Sean Quinlan,et al.  GFS: Evolution on Fast-forward , 2009, ACM Queue.

[20]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[21]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[22]  Darrell D. E. Long,et al.  Swift: Using Distributed Disk Striping to Provide High I/O Data Rates , 1991, Comput. Syst..

[23]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[24]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[25]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[26]  H. Apte,et al.  Serverless Network File Systems , 2006 .

[27]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.