A group based genetic algorithm data replica placement strategy for scientific workflow

When running data intensive scientific workflow in multiple data centers environment, it is inevitable that massive data movement will be caused. The emergence of cloud computing technologies offers a new way to develop scientific workflow systems, and using dataset replicas to reduce data transfer among data centers is an import issue. In this paper, we propose a group based genetic algorithm which can make full use of dataset replicas to reduce data transmission in cloud. We compare the performance of our proposed algorithm with that of random algorithm and K-means algorithm. The results show that our proposed algorithm can effectively reduce data movements among data centers and improve the performance of data intensive scientific workflow.

[1]  Yi-Fang Lin,et al.  Optimal placement of replicas in data grid environments with locality assurance , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[2]  Shiyong Lu,et al.  BDAP: A Big Data Placement Strategy for Cloud-Based Scientific Workflows , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.

[3]  Xiao Liu,et al.  A data placement strategy in scientific cloud workflows , 2010, Future Gener. Comput. Syst..

[4]  Tao Xie,et al.  SEA: A Striping-Based Energy-Aware Strategy for Data Placement in RAID-Structured Storage Systems , 2008, IEEE Transactions on Computers.

[5]  Babak Falsafi,et al.  Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.

[6]  Jun Feng,et al.  Eliminating replica selection - using multiple replicas to accelerate data transfer on grids , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..