Wide area data replication in an ITER-relevant data environment

Abstract The next generation of tokamak experiments will require a new way of approaching data sharing issues among fusion organizations. In the fusion community, many researchers at different worldwide sites will analyse data produced by International Thermonuclear Experimental Reactor (ITER), wherever it will be built. In this context, an efficient availability of the data in the sites where the computational resources are located becomes a major architectural issue for the deployment of ITER computational infrastructure. The approach described in this paper goes beyond the usual site-centric model mainly devoted to granting access exclusively to experimental data stored at the device sites. To this aim, we propose a new data replication architecture relying on a wide area network, based on a Master/Slave model and on synchronization techniques producing mirrored data sites. In this architecture, data replication will affect large databases (TB) as well as large UNIX-like file systems, using open source-based software components, namely MySQL, as database management system, and RSYNC and BBFTP for data transfer. A test-bed has been set up to evaluate the performance of the software components underlying the proposed architecture. The test-bed hardware layout deploys a cluster of four Dual-Xeon Supermicro each with a raid array of 1 TB. High performance network line (1 Gbit over 400 km) provides the infrastructure to test the components on a wide area network. The results obtained will be thoroughly discussed.