Optimizing Data Placement for Cost Effective and High Available Multi-Cloud Storage

With the advent of big data age, data volume has been changed from trillionbyte to petabyte with incredible speed. Owing to the fact that cloud storage offers the vision of a virtually infinite pool of storage resources, data can be stored and accessed with high scalability and availability. But a single cloud-based data storage has risks like vendor lock-in, privacy leakage, and unavailability. Multi-cloud storage can mitigate these risks with geographically located cloud storage providers. In this storage scheme, one important challenge is how to place a user's data cost-effectively with high availability. In this paper, an architecture for multi-cloud storage is presented. Next, a multi-objective optimization problem is defined to minimize total cost and maximize data availability simultaneously, which can be solved by an approach based on the non-dominated sorting genetic algorithm II (NSGA-II) and obtain a set of non-dominated solutions called the Pareto-optimal set. Then, a method is proposed which is based on the entropy method to determine the most suitable solution for users who cannot choose one from the Pareto-optimal set directly. Finally, the performance of the proposed algorithm is validated by extensive experiments based on real-world multiple cloud storage scenarios.