Visions for data management and remote collaboration for ITER

Challenges The need for efficient between-shot analysis and visualization is driven by the high cost of operating the experimental facility. (“Shots” are the basic units of fusion experiments. Today, a typical large facility might take shots at a rate of 2-4 per hour and accumulate about 2,000 shots per year.) The average cost per shot for ITER, defined here as the integrated project cost divided by the total shots estimated over the project lifetime, will approach one million US dollars. Thus, the number of shots required to optimize performance and to carry out experimental programs must be minimized. This translates into a need to carry out extensive analysis and assessment immediately after each shot. ITER shots will also be much longer than on most current machines and will generate much more data, perhaps a Terabyte per shot. The quantity of data itself, perhaps 2 PB per year, will likely not be a technical challenge at the time that ITER will be operating – about a decade from now. However long-pulse operation will require concurrent writing, reading, visualization and analysis of experimental data. More challenging is the integration across time scales. The data set will encompass more than a factor of 10 9 in significant time scales, leading to requirements for efficient browsing of very long data records and the ability to describe and locate specific events accurately from within very long time series. Not only will ITER be an expensive device, as a licensed nuclear facility and the first reactor-scale fusion experiment, security of the plant will be a paramount concern. The data systems must balance these requirements with the need to keep data access as open to the participating scientists as possible. Mechanisms and modalities for remote control must also fit into a robust security model. Further, the 10-year construction and 15+ year operating life for ITER will encompass evolutionary and revolutionary changes in hardware, software and protocols; thus the system must be based on a conceptual design that is extensible, flexible and robust enough to meet new requirements and be capable of adapting and migrating to new technologies and to new computing platforms as they arise. Backward compatibility, the ability to read old data and perform old analysis, must be maintained over the life of the experiment.