A Data Placement Algorithm for Data Intensive Applications in Cloud

Data layout is an important issue which aims at reducing data movements among data centers to improve the efficiency of the entire cloud system. This paper proposes a dataintensive application oriented data layout algorithm. It is based on hierarchical data correlation clustering and the PSO algorithm. The datasets with fixed location have been considered, and both the offline strategy and the online strategy for data layout have been given. As this proposed strategy is aimed at reducing the global amount of data transmissions, and the special permission of the datasets has been introduced, the cost of data transmission can be measured more reasonable. Simulation results show that compared with two classical strategies, our algorithm can reduce the amount of data transmission more effectively.

[1]  Huan Liu,et al.  GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[2]  Xiao Liu,et al.  A Revised Discrete Particle Swarm Optimization for Cloud Workflow Scheduling , 2010, 2010 International Conference on Computational Intelligence and Security.

[3]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[4]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[5]  Fangxiong Xiao,et al.  Dynamic deployment of virtual machines in cloud computing using multi-objective optimization , 2014, Soft Computing.

[6]  Xiao Liu,et al.  A data placement strategy in scientific cloud workflows , 2010, Future Gener. Comput. Syst..

[7]  Miron Livny,et al.  A framework for reliable and efficient data placement in distributed computing systems , 2005, J. Parallel Distributed Comput..

[8]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[9]  Miron Livny,et al.  Stork: making data placement a first class citizen in the grid , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[10]  Roger Smith,et al.  Computing in the Cloud , 2009 .

[11]  Jian Xiao,et al.  A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[12]  Erzhou Zhu,et al.  A New Particle Swarm Optimization-Based Strategy for Cost-Effective Data Placement in Scientific Cloud Workflows , 2014 .