A Machine-Learning-Ready Dataset Prepared from the Solar and Heliospheric Observatory Mission

11 We present a Python tool to generate a standard dataset from solar images that allows for user-defined selection criteria and a range of pre-processing steps. Our Python tool works with all image products from both the Solar and Heliospheric Observatory (SoHO) and Solar Dynamics Observatory (SDO) missions. We discuss a dataset produced from the SoHO mission’s multi-spectral images which is free of missing or corrupt data as well as planetary transits in coronagraph images, and is temporally synced making it ready for input to a machine learning system. Machine-learning-ready images are a valuable resource for the community because they can be used, for example, for forecasting space weather parameters. We illustrate the use of this data with a 3-5 day-ahead forecast of the north-south component of the interplanetary magnetic field (IMF) observed at Lagrange point one (L1). For this use case, we apply a deep convolutional neural network (CNN) to a subset of the full SoHO dataset and compare with baseline results from a Gaussian Naive Bayes classifier. 12 1 Background & Summary 13 Studies based on physics models have shown that solar magnetic field captured with magnetograms contain crucial information 14 for estimating the speed of the solar wind, while the dynamical features of CMEs (angular width, initial speed, etc.) are 15 routinely inferred from coronagraph images1, 2. Hence, it is expected that applying machine learning (ML) techniques to 16 the high temporal coverage data of both the Solar and Heliospheric Observatory3 (SoHO) mission and the Solar Dynamics 17 Observatory4 (SDO) mission is a feasible venture, that can potentially improve the space weather forecasting capability of 18 current models5–7. With the quality of input data remaining paramount to the success of these ML-methods8–10 and to ensure 19 reproducible scientific research11, 12, the preparation of a community-wide standard dataset with a standard software is crucial. 20 At present, SoHO has provided more temporal coverage of the Sun than its successor, NASA’s SDO, and has also fully 21 covered Solar Cycle 23 and 24 with a suite of on-board instruments3 including those specific to solar imaging: the Michelson 22 Doppler Imager13 (MDI) for the solar photosphere, the Extreme ultraviolet Imaging Telescope14 (EIT) for the stellar atmosphere 23 to low corona, and the Large Angle and Spectrometric Coronagraph15 (LASCO) covering the corona from 1.5−30 Rs, detailed 24 in Table 1. Recently, a white-light coronal brightness index (CBI), constructed from the full LASCO C2 mission archive, 25 was used to explore correlations between the solar corona and several geophysical indices16. Although SDO has even higher 26 resolution and cadence, SoHO continues to uniquely provide coronagraph products and serves as a mission critical backup to 27 the SDO for solar flare and CME forecasting in the event of SDO failure. 28 Stanford University’s Joint Science Operation Center (JSOC) stores data from SoHO MDI, SDO HMI and AIA, and various 29 other solar instruments. The SunPy-affiliated package DRMS enables querying of these images17, 18. All of these individual 30 image products from JSOC are at the same processing level and are supplied in a Flexible Image Transport System (FITS) 31 format that contains only scalar values. The NASA Solar Data Analysis Center’s (SDAC) Virtual Solar Observatory19 (VSO) 32 tool enables data queries from a number of individual data providers. SDAC’s terabytes of available EIT and LASCO images 33 are also in FITS format. However, the SDAC data is highly heterogeneous. Not only are there intrinsic differences among 34 these SoHO products (e.g., individual cadence as shown in Table 1), but there is also an irregular assortment of image file 35 sizes and processing levels. These varying file sizes can correspond to different image resolutions, calibrations and orbital 36 maneuvers, and multi-frame recordings. In addition, all four of the publicly available EIT products require calibration to 37 ar X iv :2 10 8. 06 39 4v 1 [ as tr oph .S R ] 4 A ug 2 02 1

[1]  Prasanth H. Nair,et al.  Astropy: A community Python package for astronomy , 2013, 1307.6212.

[2]  Nitin Choudhary,et al.  drms: A Python package for accessing HMI and AIA data , 2019, J. Open Source Softw..

[3]  P. Mahadevan,et al.  An overview , 2007, Journal of Biosciences.

[4]  P. Lamy,et al.  Restoration of the K and F Components of the Solar Corona from LASCO-C2 Images over 24 Years [1996 – 2019] , 2020, Solar Physics.

[5]  Kevin Reardon,et al.  The SunPy Project: Open Source Development and Status of the Version 1.0 Core Package , 2020, The Astrophysical Journal.

[6]  Kevin Reardon,et al.  The Virtual Solar Observatory—A Resource for International Heliophysics Research , 2009 .

[7]  K. M. Laundal,et al.  Snakes on a Spaceship—An Overview of Python in Heliophysics , 2018, Journal of Geophysical Research: Space Physics.

[8]  Quentin F. Stout,et al.  The Space Weather Modeling Framework , 2005 .

[9]  Christopher. Simons,et al.  Machine learning with Python , 2017 .

[10]  Enrico Camporeale,et al.  The Challenge of Machine Learning in Space Weather: Nowcasting and Forecasting , 2019, Space Weather.

[11]  Thuy Mai,et al.  Solar Dynamics Observatory (SDO) , 2015 .

[12]  A. I. Yakimchik Jupyter Notebook: a system for interactive scientific computing , 2019 .

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  David J. McComas,et al.  The three‐dimensional solar wind around solar maximum , 2003 .

[15]  D. Baker,et al.  Sun Unleashes Halloween Storm , 2004 .

[16]  J. D. Meiss,et al.  Leveraging the mathematics of shape for solar magnetic eruption prediction , 2020, Journal of Space Weather and Space Climate.

[17]  Enrico Camporeale,et al.  Ten Ways to Apply Machine Learning in Earth and Space Sciences , 2021, Eos.

[18]  Alan M. Title,et al.  The solar oscillations investigation - Michelson Doppler Imager. , 1992 .

[19]  P. Lamy,et al.  The Large Angle Spectroscopic Coronagraph (LASCO) , 1995 .

[20]  Miguel de Val-Borro,et al.  The Astropy Project: Building an Open-science Project and Status of the v2.0 Core Package , 2018, The Astronomical Journal.

[21]  V. Domingo,et al.  The SOHO mission: An overview , 1995 .

[22]  R. Schwenn Space Weather: The Solar Perspective , 2006 .

[23]  Haimin Wang,et al.  Statistical Distributions of Speeds of Coronal Mass Ejections , 2005 .

[24]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[25]  S. Krantz Fractal geometry , 1989 .

[26]  Simon P. Plunkett,et al.  Introduction to violent Sun‐Earth connection events of October–November 2003 , 2005 .

[27]  Towards Construction of a Solar Wind “Reanalysis” Dataset: Application to the First Perihelion Pass of Parker Solar Probe , 2019, Solar Physics.

[28]  Howard J. Singer,et al.  Space Weather Forecasting: A Grand Challenge , 2013 .

[29]  Simon Liedtke,et al.  SunPy—Python for solar physics , 2015, 1505.02563.

[30]  T. Pulkkinen,et al.  What sustained multi-disciplinary research can achieve: The space weather modeling framework , 2021, Journal of Space Weather and Space Climate.

[31]  G. Lapenta,et al.  Understanding space weather to shield society: A global road map for 2015-2025 commissioned by COSPAR and ILWS , 2015, 1503.06135.

[32]  W. Neupert,et al.  EIT: Extreme-ultraviolet Imaging Telescope for the SOHO mission , 1995 .

[33]  Monica G. Bobra,et al.  Machine Learning, Statistics, and Data Mining for Heliophysics , 2019 .

[34]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.