A Framework for International Collaboration on ITER Using Large-Scale Data Transfer to Enable Near-Real-Time Analysis

Abstract The global nature of the ITER project along with its projected approximately petabyte-per-day data generation presents not only a unique challenge but also an opportunity for the fusion community to rethink, optimize, and enhance our scientific discovery process. Recognizing this, collaborative research with computational scientists was undertaken over the past several years to create a framework for large-scale data movement across wide-area networks to enable global near-real-time analysis of fusion data. This would broaden the available computational resources for analysis/simulation and increase the number of researchers actively participating in experiments. An official demonstration of this framework for fast, large data transfer and real-time analysis was carried out between the KSTAR tokamak in Daejeon, Korea, and Princeton Plasma Physics Laboratory (PPPL) in Princeton, New Jersey. Streaming large data transfer, with near-real-time movie creation and analysis of the KSTAR electron cyclotron emission imaging data, was performed using the Adaptable Input Output (I/O) System (ADIOS) framework, and comparisons were made at PPPL with simulation results from the XGC1 code. These demonstrations were made possible utilizing an optimized network configuration at PPPL, which achieved over 8.8 Gbps (88% utilization) in throughput tests from the National Fusion Research Institute to PPPL. This demonstration showed the feasibility for large-scale data analysis of KSTAR data and provides a nascent framework to enable use of globally distributed computational and personnel resources in pursuit of scientific knowledge from the ITER experiment.

[1]  A. D. Turnbull,et al.  Integrated modeling applications for tokamak experiments with OMFIT , 2015 .

[2]  E. N. Coviello,et al.  Remote third shift EAST operation: a new paradigm , 2017 .

[3]  Shunji Abe,et al.  High-performance data transfer for full data replication between iter and the remote experimentation centre , 2019, Fusion Engineering and Design.

[4]  Eli Dart,et al.  The Science DMZ: A network design pattern for data-intensive science , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  Mark D. Boyer,et al.  Real-time capable modeling of neutral beam injection on NSTX-U using neural networks , 2019, Nuclear Fusion.

[6]  G. Abla,et al.  Remote participation in ITER exploitation—conceptual design , 2011 .

[7]  T Maeno,et al.  PanDA: distributed production and distributed analysis system for ATLAS , 2008 .

[8]  E. Joffrin,et al.  Metis: a fast integrated tokamak modelling tool for scenario design , 2018, Nuclear Fusion.

[9]  Norihiro Nakajima,et al.  Verification tests for remote participation at ITER REC , 2018 .

[10]  F. Poli Integrated tokamak modeling: when physics informs engineering and research planning , 2017 .

[11]  Roderick Murray-Smith,et al.  Bayesian parameter estimation using conditional variational autoencoders for gravitational-wave astronomy , 2019, Nature Physics.

[12]  J. Choi,et al.  Data Federation Challenges in Remote Near-Real-Time Fusion Experiment Data Processing , 2020, SMC.

[13]  Shigeo Urushidani,et al.  A TCP/IP-based constant-bit-rate file transfer protocol and its extension to multipoint data delivery , 2014 .

[14]  Keichi Takahashi,et al.  ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management , 2020, SoftwareX.

[15]  David Allan Humphreys,et al.  Chapter 8: Plasma operation and control , 2007 .

[16]  Robert Hager,et al.  A new hybrid-Lagrangian numerical scheme for gyrokinetic simulation of tokamak edge plasma , 2016, J. Comput. Phys..

[17]  R. Churchill,et al.  Deep convolutional neural networks for multi-scale time-series classification and application to tokamak disruption prediction using raw, high temporal resolution diagnostic data , 2020, Physics of Plasmas.

[18]  E. Westerhof,et al.  ECE for NTM control on ITER , 2012 .

[19]  Scott Klasky,et al.  Multilevel Techniques for Compression and Reduction of Scientific Data - The Multivariate Case , 2019, SIAM J. Sci. Comput..

[20]  Franck Cappello,et al.  Coupling Exascale Multiphysics Applications: Methods and Lessons Learned , 2018, 2018 IEEE 14th International Conference on e-Science (e-Science).

[21]  Neville C. Luhmann,et al.  2D/3D electron temperature fluctuations near explosive MHD instabilities accompanied by minor and major disruptions , 2016 .

[22]  A. Dinklage,et al.  Integrated Data Analysis for Fusion: A Bayesian Tutorial for Fusion Diagnosticians , 2008 .

[23]  Gilles Louppe,et al.  The frontier of simulation-based inference , 2020, Proceedings of the National Academy of Sciences.

[24]  Kesheng Wu,et al.  ICEE: Wide-area In Transit Data Processing Framework For Near Real-Time Scientific Applications , 2013 .

[25]  C. Domier,et al.  Development of KSTAR ECE imaging system for measurement of temperature fluctuations and edge density fluctuations. , 2010, The Review of scientific instruments.

[26]  B. P. Duval,et al.  Design and first applications of the ITER integrated modelling & analysis suite , 2015 .

[27]  E. Kolemen,et al.  Simultaneous detection of neoclassical tearing mode and electron cyclotron current drive locations using electron cyclotron emission in DIII-D , 2019, Fusion Engineering and Design.

[28]  the DIII-D team,et al.  Deep convolutional neural networks for multi-scale time-series classification and application to disruption prediction in fusion devices , 2019, 1911.00149.

[29]  Yu Xie,et al.  Federated Computing for the Masses--Aggregating Resources to Tackle Large-Scale Engineering Problems , 2014, Computing in Science & Engineering.

[30]  Matthew Mathis,et al.  The macroscopic behavior of the TCP congestion avoidance algorithm , 1997, CCRV.

[31]  Scott Klasky,et al.  Stream processing for near real-time scientific data analysis , 2016, 2016 New York Scientific Data Summit (NYSDS).

[32]  J. Choi,et al.  Leading magnetic fusion energy science into the big-and-fast data lane , 2020 .

[33]  Scott Klasky,et al.  Characterizing output bottlenecks in a supercomputer , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[34]  David P. Schissel,et al.  Automatic Between-Pulse Analysis of DIII-D Experimental Data Performed Remotely on a Supercomputer at Argonne Leadership Computing Facility , 2018 .

[35]  S. H. Kim,et al.  PROGRESS IN THE ITER INTEGRATED MODELLING PROGRAMME AND THE ITER SCENARIO DATABASE , 2018 .