Achieving Digital-Twin Through Advanced Analytics Support: A Novelty Detection Framework to Highlight Real-Time Anomalies in Time Series

As Industrial plants evolve towards massive digitalization and their "digital twin" architectures are constantly enriched with a wide range of advanced analytics solutions that are becoming part of the ordinary control operations, the quality of the input data becomes of paramount importance. But as the number of variables, KPIs and processes monitored in plants raise of several orders of magnitude, automated procedures are needed for an effective monitoring. We have developed a novelty detection framework with the objective of monitoring each and every variable in the plant, detecting anomalies in real time and therefore allowing the operators to investigate more in depth on the possible causes before any damage is made. In order to do so, we had to define the "normality" of a signal, which can heavily vary from one to the other. Therefore, we setup a learning procedure that integrates several steps. First, we need to distinguish the normal periods from the anomalous ones. While there are failures that are known to have impacted at certain times certain sections of the plant, three will definitely be more anomalies that have remained hidden so far. We hence labelled each timestamp of the series using an isolation forest algorithm. Using the obtained normal dataset, we then extracted for each sensor the features that we have identified to better characterize a time series in its normal operating conditions. First, we select the signals in the plant that are the most correlated with the one at hand and fit a Ridge Regression and estimate the residual distribution. Then, we extract statistics such as mean, standard deviation, mean length and frequency of frozen periods, outliers, NaNs, Fast Fourier Transform and Welch’s approach for spectral density estimation. Finally, we heuristically define a number of tests capable of distinguishing a normal from an anomalous time window. With this procedure, we are able to detect in real time whenever a signal is behaving differently from the way it is expected to. Depending on the operator’s experience, it would then be possible to understand whether it is a sensor malfunction or whether it indicates that something is wrong with the physical phenomenon, determining different actions, such as maintenance rescheduling. We have therefore conceived a dashboard that allows each operator to input its feedback, producing a refined dataset ready for continuous retraining. Ultimately, this anomaly detection framework will be used also to filter out the inputs to many advanced analytics solutions, guaranteeing the quality of their results.