Migrating enterprise storage applications to the cloud

Cloud computing has emerged as a model for hosting computing infrastructure and outsourcing management of that infrastructure. It offers the promise of simplified provisioning and management, lower costs, and access to resources that scale up and down with demand. Cloud computing has seen growing use for Web site hosting, large batch processing jobs, and similar tasks. Despite potential advantages, however, cloud computing is not much used for enterprise applications such as backup, shared file systems, and other internal systems. Many challenges need to be overcome to make cloud suitable for these applications, among them cost, performance, security, and interface mismatch. In this dissertation, I investigate how cloud infrastructure can be used for internal services in an organization, with a focus on storage applications. I show how to design systems to address the challenges of using the cloud by building two example systems. The first, Cumulus, implements file system backup to a remote cloud storage provider. With Cumulus I consider the constraints imposed by the interface to cloud storage, and how to work within those constraints to minimize the cost. The second system, BlueSky, is a shared network file server which is backed by cloud storage. BlueSky builds on ideas from Cumulus to reduce system cost. It relies on cloud storage for data durability, but provides good performance by caching data locally. I additionally study how file system maintenance tasks can be offloaded to the cloud while protecting the confidentiality and integrity of file system data. Together, these two systems demonstrate that, despite the challenges, we can bring the benefits of the cloud to enterprise storage applications.

[1]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[2]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[3]  W. Curtis Preston Backup & Recovery , 2006 .

[4]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[5]  Eugene Ciurana,et al.  Google App Engine , 2009 .

[6]  John Kubiatowicz,et al.  Antiquity: exploiting a secure log for wide-area distributed storage , 2007, EuroSys '07.

[7]  Brian D. Noble,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .

[8]  Ramakrishna Kotla,et al.  SafeStore: A Durable and Practical Storage System , 2007, USENIX Annual Technical Conference.

[9]  Hunter Muller,et al.  To Cloud or Not To Cloud , 2012 .

[10]  Andrew Tridgell,et al.  Efficient Algorithms for Sorting and Synchronization , 1999 .

[11]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[12]  Srinath T. V. Setty,et al.  Depot: Cloud Storage with Minimal Trust , 2010, TOCS.

[13]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[14]  Miguel Correia,et al.  DepSky: Dependable and Secure Storage in a Cloud-of-Clouds , 2013, TOS.

[15]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[16]  Julian Satran,et al.  Internet Small Computer Systems Interface (iSCSI) , 2004, RFC.

[17]  Val Henson,et al.  An Analysis of Compare-by-hash , 2003, HotOS.

[18]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[19]  A. Zahariev Google App Engine , 2009 .

[20]  Radu Sion,et al.  To cloud or not to cloud?: musings on costs and viability , 2011, SOCC '11.

[21]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[22]  John Wilkes,et al.  My Cache or Yours? Making Storage More Exclusive , 2002, USENIX Annual Technical Conference, General Track.

[23]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[24]  John Wilkes,et al.  A trace-driven analysis of disk working set sizes , 1993 .

[25]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[26]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[27]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[28]  Dennis Shasha,et al.  Secure Untrusted Data Repository (SUNDR) , 2004, OSDI.

[29]  冯海超 Windows Azure:微软押上未来 , 2012 .

[30]  Jun Wang,et al.  WOLF - A Novel Reordering Write Buffer to Boost the Performance of Log-Structured File Systems , 2002, FAST.