Recent advancements in the field of instrumentation and the computational breakthroughs have enabled researchers to gather increasingly large datasets. In order to efficiently process, analyze, and store such large datasets, High Performance Computing (HPC) systems are needed. Various institutions provide HPC resources and services to the scientific community for doing open-science research, often without any direct cost to the community. However, maintaining large datasets (in the range of TBs and above) for in-place processing and analyses is usually not an option in open-science data centers because the resources are shared amongst multiple projects and the disk-space owned by any user-account is limited. Therefore, for storage of large volumes of data that could be of short-term, mid-term or long-term retention value, large capacity online disks and/or tape-based archival systems are used. The data is moved between storage and computational resources as needed. Given such a scenario, optimal strategies for data movement, data archiving, and data preservation are crucial. In this chapter, we discuss state-of-the-practice in data storage infrastructure, data movement, data archiving, and data preservation at open-science data centers supporting data-intensive computing.
[1]
James Frew,et al.
A data model and architecture for long-term preservation
,
2008,
JCDL '08.
[2]
Reagan Moore,et al.
The integrated Rule-Oriented Data System (iRODS 4.0) Microservice Workbook
,
2015
.
[3]
Wu-chun Feng,et al.
The design, implementation, and evaluation of mpiBLAST
,
2003
.
[4]
Ali Raza Butt,et al.
CATCH: A Cloud-Based Adaptive Data Transfer Service for HPC
,
2011,
2011 IEEE International Parallel & Distributed Processing Symposium.
[5]
Sandeep K. S. Gupta,et al.
DASH: a Recipe for a Flash-based Data Intensive Supercomputer
,
2010,
2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[6]
Anthony Skjellum,et al.
Using MPI - portable parallel programming with the message-parsing interface
,
1994
.