Cloud Data Federation for Scientific Applications

Nowadays, data-intensive scientific research needs storage capabilities that enable efficient data sharing. This is of great importance for many scientific domains such as the Virtual Physiological Human. In this paper, we introduce a solution that federates a variety of systems ranging from file servers to more sophisticated ones used in clouds or grids. Our solution follows a client-centric approach that loosely couples a variety of data resources that may use different technologies such as Openstack-Swift, iRODS, GridFTP, and may be geographically distributed. It is implemented as a lightweight service which does not require installation of a software on the resources it uses. In this way we are able to efficiently use heterogeneous storage resources, reduce the usage complexity of multiple storage resources, and avoid vendor lock-in in case of cloud storage. To demonstrate the usability of our approach we performed a number of experiments that assess the performance and functionality of the developed system.

[1]  Alberto Sánchez,et al.  MAPFS-DAI, an extension of OGSA-DAI based on a parallel file system , 2007, Future Gener. Comput. Syst..

[2]  Marco Viceconti,et al.  PhysiomeSpace: digital library service for biomedical data , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[3]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[4]  Anthony J. G. Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery [Point of View] , 2011 .

[5]  Siegfried Benkner,et al.  Towards Collaborative Data Management in the VPH-Share Project , 2011, Euro-Par Workshops.

[6]  Mario Cannataro,et al.  Euro-Par 2011: Parallel Processing Workshops , 2011, Lecture Notes in Computer Science.

[7]  K. G. Begeman,et al.  LOFAR Information System , 2011, Future generations computer systems.

[8]  Reagan Moore,et al.  Data grids, collections, and grid bricks , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[9]  A. Rajasekar,et al.  Integration of Cloud Storage with Data Grids , 2009 .

[10]  Hakim Weatherspoon,et al.  RACS: a case for cloud storage diversity , 2010, SoCC '10.

[11]  Zahir Tari,et al.  MetaCDN: Harnessing 'Storage Clouds' for high performance content delivery , 2009, J. Netw. Comput. Appl..

[12]  Marian Bubak,et al.  Collaborative e-Science Experiments and Scientific Workflows , 2011, IEEE Internet Computing.

[13]  Ian T. Foster,et al.  The virtual data grid: a new model and architecture for data-intensive collaboration , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..