Data-Driven-Based Disruption Prediction in GOLEM Tokamak with Missing Values
暂无分享,去创建一个
The sensor readings from the vessel of the Tokamak are inconsistent where some sensors always produce output; others provide output occasionally resulting in missing values on entire data. Typically, imputation methods are applied to both train and test data while training the model offline. But in the real-time application where the decision must be taken based on the instream sensor data, imputation techniques are not practical. Hence, in this paper, a data-driven approach on algorithms that inherently handle missing values and algorithms that have the provision to deal with missing values through a replacement or technique is employed. Individual, bagging and boosting algorithms are utilized to classify normal and disruption charges on the GOLEM Tokamak dataset, which consists of 117 normal and 70 disruptive shots. Boosting algorithms having an inbuilt feature to handle missing values provided better results amongst other algorithms and Categorical Boosting (CatBoost) with its ordered boosting feature gave the best metrics. Optimal thresholds for receiver operating characteristics (ROC) and precision–recall (PR) curves on the models are determined. The optimal PR values are utilized to get improved results. A comparison with the widely employed stand-alone machine learning algorithms and ensemble algorithms is illustrated. The results show the excellent performance of the CatBoost model with an F1 score of 0.943 with optimal PR values. The developed predictive model would be capable of warning the human operator with feedback about the feature(s) causing the disruption.