论文信息 - Popularity-Aware Multi-Failure Resilient and Cost-Effective Replication for High Data Durability in Cloud Storage

Popularity-Aware Multi-Failure Resilient and Cost-Effective Replication for High Data Durability in Cloud Storage

Large-scale data stores are an increasingly important component of cloud datacenter services. However, cloud storage system usually experiences data loss, hindering data durability. Three-way random replication is commonly used to lead better data durability in cloud storage systems. However, three-way random replication cannot effectively handle correlated machine failures to prevent data loss. Although Copyset Replication and Tiered Replication can reduce data loss in correlated and independent failures, and enhance data durability, they fail to leverage different data popularities to substantially reduce the storage cost and bandwidth cost caused by replication. To address these issues, we present a popularity-aware multi-failure resilient and cost-effective replication (PMCR) scheme for high data durability in cloud storage. PMCR splits the cloud storage system into primary tier and backup tier, and classifies data into hot data, warm data and cold data based on data popularities. To handle both correlated and independent failures, PMCR stores the three replicas of the same data into one Copyset formed by two servers in the primary tier and one server in the backup tier. For the third replicas of warm data and cold data in the backup tier, PMCR uses the compression methods to reduce storage cost and bandwidth cost. Extensive numerical results based on trace parameters and experimental results from real-world Amazon S3 show that PMCR achieves high data durability, low probability of data loss, and low storage cost and bandwidth cost compared to previous replication schemes.

Husnu S. Narman | Haiying Shen | Jinwei Liu | Haiying Shen | Jinwei Liu

[1] Anne-Marie Kermarrec,et al. Archiving cold data in warehouses with clustered network coding , 2014, EuroSys '14.

[2] Robbert van Renesse,et al. Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[3] P. Östergård,et al. There exists no (15,5,4) RBIBD † * , 2001 .

[4] Ming Zhao,et al. Client-side Flash Caching for Cloud Systems , 2014, SYSTOR 2014.

[5] Van-Anh Truong,et al. Availability in Globally Distributed Storage Systems , 2010, OSDI.

[6] Jin Li,et al. Reducing replication bandwidth for distributed document databases , 2015, SoCC.

[7] Kang Chen,et al. DSearching: Distributed searching of mobile nodes in DTNs with floating mobility information , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[8] Hong Jiang,et al. SiLo: A Similarity-Locality based Near-Exact Deduplication Scheme with Low RAM Overhead and High Throughput , 2011, USENIX Annual Technical Conference.

[9] Karl Aberer,et al. A self-organized, fault-tolerant and scalable replication scheme for cloud storage , 2010, SoCC '10.

[10] Veena Rawat,et al. Reducing Failure Probability of cloud storage services using Multi-Clouds , 2013, ArXiv.

[11] Hong Jiang,et al. A Scalable Inline Cluster Deduplication Framework for Big Data Protection , 2012, Middleware.

[12] Seung-won Hwang,et al. Scalable Load Balancing in Cluster Storage Systems , 2011, Middleware.

[13] Haitao Wu,et al. CubicRing: Enabling One-Hop Failure Detection and Recovery for Distributed In-Memory Storage Systems , 2015, NSDI.

[14] Ju Wang,et al. Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[15] Karl Aberer,et al. Autonomic SLA-Driven Provisioning for Cloud Applications , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[16] Indranil Gupta,et al. Popular is cheaper: curtailing memory costs in interactive analytics engines , 2018, EuroSys.

[17] S. Houghten,et al. There is no (46, 6, 1) block design* , 2001 .

[18] Michael Dahlin,et al. TAPER: tiered approach for eliminating redundancy in replica synchronization , 2005, FAST'05.

[19] Indranil Gupta,et al. Making cloud intermediate data fault-tolerant , 2010, SoCC '10.

[20] Andreas Haeberlen,et al. Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[21] Murali S. Kodialam,et al. Frugal storage for cloud file systems , 2012, EuroSys '12.

[22] Giridhar Appaji Nag Yasa,et al. Space savings and design considerations in variable length deduplication , 2012, OPSR.

[23] Robert J. Chansler,et al. Data Availability and Durability with the Hadoop Distributed File System , 2012, login Usenix Mag..

[24] Ming Zhong,et al. Replication degree customization for high availability , 2008, Eurosys '08.

[25] Mendel Rosenblum,et al. Fast crash recovery in RAMCloud , 2011, SOSP.

[26] Ethan L. Miller,et al. Purity: Building Fast, Highly-Available Enterprise Flash Storage from Commodity Components , 2015, SIGMOD Conference.

[27] Srinivasan Seshan,et al. Subtleties in Tolerating Correlated Failures in Wide-area Storage Systems , 2006, NSDI.