Application of association rule mining for replication in scientific data grid

Grid computing is the most popular infrastructure in many emerging field of science and engineering where extensive data driven experiments are conducted by thousands of scientists all over the world. Efficient transfer and replication of these peta-byte scale data sets are the fundamental challenges in Scientific Grid. Data grid technology is developed to permit data sharing across many organizations in geographically disperse locations. Replication of data helps thousands of researchers all over the world to access those data sets more efficiently. Data replication is essential to ensure data reliability and availability across the grid. Replication ensures above mentioned criteria by creating more copies of same data sets across the grid. In this paper, we proposed a data mining based replication to accelerate the data access time. Our proposed algorithm mines the hidden rules of association among different files for replica optimization which proves highly efficient for different access patterns. The algorithm is simulated using data grid simulator, OptorSim, developed by European Data Grid project. Then our algorithm is compared with the existing approaches where it outperforms others.

[1]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[2]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[3]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[4]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[5]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .