Comparative Analysis of Prediction of Coronal Mass Ejections (CME) based on Sunspot Activities Using Various Machine Learning Models

In this paper numerous models of Classification Learning Algorithms were used to explore the relation of sunspot counts and coronal mass ejections (CMEs). The aim was to determine the best predictive performance of initiation of CMEs in association with the sunspot counts and solar cycle amongst the models used. The NOAA Daily American Sunspot number and the SOHO/LASCO CME catalogue was processed to create the dataset for implementation in the various classification learning algorithms. Extensive experimentation was done using various features from the catalogues and the most suitable four features (Day, Month, Year and Sunspot counts) were selected as the input nodes for the classification learning. In our work a comparative analysis of the performance of accuracy of various classification learning techniques namely, decision tree, nearest neighbor, support vector machine, discriminant, ensembles and logistic regression; was shown on the basis of Receiver Operating Characteristics (ROC) analysis and studying the Confusion Matrices. These classification learning algorithms takes the date and sunspot number as the input and spits out the result of whether a CME will be initiated for that instance or not. The dataset was split using the statistical k-fold cross validation method for the training and testing purpose. It was observed that amongst the various model used here, Ensemble Bagged Tree showed the best performance with an accuracy of 90.8%. Further work can be done on this data set with a better predictive performance through the use of various neural network models.