E2FS: an elastic storage system for cloud computing

In cloud storage, replication technologies are essential to fault tolerance and high availability of data. While achieving the goal of high availability, replication brings extra number of active servers to the storage system. Extra active servers mean extra power consumption and capital expenditure. Furthermore, the lack of classification of data makes replication scheme fixed at the very beginning. This paper proposes an elastic and efficient file storage called E2FS for big data applications. E2FS can dynamically scale in/out the storage system based on real-time demands of big data applications. We adopt a novel replication scheme based on data blocks, which provides a fine-grained maintenance of the data in the storage system. E2FS analyzes features of data and makes dynamic replication decision to balance the cost and performance of cloud storage. To evaluate the performance of proposed work, we implement a prototype of E2FS and compare it with HDFS. Our experiments show E2FS can outperform HDFS in elasticity while achieving guaranteed performance for big data applications.

[1]  Gregory R. Ganger,et al.  SpringFS: bridging agility and performance in elastic distributed storage , 2014, FAST.

[2]  Keke Gai,et al.  Dynamic energy-aware cloudlet-based mobile cloud computing model for green computing , 2016, J. Netw. Comput. Appl..

[3]  Gang Li,et al.  Big data related technologies, challenges and future prospects , 2015, J. Inf. Technol. Tour..

[4]  Meikang Qiu,et al.  Informer homed routing fault tolerance mechanism for wireless sensor networks , 2013, J. Syst. Archit..

[5]  Andrea C. Arpaci-Dusseau,et al.  Analysis of HDFS under HBase: a facebook messages case study , 2014, FAST.

[6]  Min Chen,et al.  On the computation offloading at ad hoc cloudlet: architecture and service modes , 2015, IEEE Communications Magazine.

[7]  Min Chen,et al.  Energy Optimization With Dynamic Task Scheduling Mobile Cloud Computing , 2017, IEEE Systems Journal.

[8]  Victor C. M. Leung,et al.  CAP: community activity prediction based on big data analysis , 2014, IEEE Network.

[9]  Victor C. M. Leung,et al.  Enabling technologies for future data center networking: a primer , 2013, IEEE Network.

[10]  Emin Gün Sirer,et al.  Tiered Replication: A Cost-effective Alternative to Full Cluster Geo-replication , 2015, USENIX Annual Technical Conference.

[11]  Meikang Qiu,et al.  Feedback Dynamic Algorithms for Preemptable Job Scheduling in Cloud Systems , 2010 .

[12]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.

[13]  Meikang Qiu,et al.  A decentralized approach for mining event correlations in distributed system monitoring , 2013, J. Parallel Distributed Comput..

[14]  Peter J. Varman,et al.  Balancing fairness and efficiency in tiered storage systems with bottleneck-aware allocation , 2014, FAST.

[15]  Ari Juels,et al.  HAIL: a high-availability and integrity layer for cloud storage , 2009, CCS.