Research on Decentralized Group Replication Strategy Based on Correlated Patterns Mining in Data Grids

Aiming at the problem that most of the existing data mining based replication strategies cannot extract correlations between files effectively, a new decentralized replication strategy based on maximal frequent correlated patterns mining, called RSMFCP, is proposed. By translating the files access history to the binary access history, applying maximal frequent correlated patterns mining and performing replication, RSMFCP can extremely eliminate redundancy and optimize the replication performance. Data analysis and simulation results show that, comparing with other strategies like no replication, PRA, DR2 and PDDRA, RSMFCP can extract correlations more effectively and gain lower mean job execute time under different access patterns, which will provide a new option to reduce transmission delay in data grid.

[1]  R. Manimegalai,et al.  Dynamic replica placement and selection strategies in data grids - A comprehensive survey , 2014, J. Parallel Distributed Comput..

[2]  Jiawei Han,et al.  Re-examination of interestingness measures in pattern mining: a unified framework , 2010, Data Mining and Knowledge Discovery.

[3]  Evelina Lamma,et al.  Statistical relational learning for workflow mining , 2016, Intell. Data Anal..

[4]  Peter C. J. Graham,et al.  Adaptive popularity-driven replica placement in hierarchical data grids , 2010, The Journal of Supercomputing.

[5]  Amir Masoud Rahmani,et al.  PDDRA: A new pre-fetching based dynamic data replication algorithm in data grids , 2012, Future Gener. Comput. Syst..

[6]  Pensri Amornsinlaphachai,et al.  Efficiency of data mining models to predict academic performance and a cooperative learning model , 2016, 2016 8th International Conference on Knowledge and Smart Technology (KST).

[7]  Albert Y. Zomaya,et al.  Hopfield neural network for simultaneous job scheduling and data replication in grids , 2013, Future Gener. Comput. Syst..

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Tristan Glatard,et al.  A classification of file placement and replication methods on grids , 2013, Future Gener. Comput. Syst..

[10]  Cheng Wang,et al.  Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA) , 2011, The Journal of Supercomputing.

[11]  Fang-Yie Leu,et al.  PFRF: An adaptive data replication algorithm based on star-topology data grids , 2012, Future Gener. Comput. Syst..