Real-Time Outlier Detection and Bayesian Classification using Incremental Computations for Efficient and Scalable Stream Analytics for IoT for Manufacturing

Abstract As the manufacturing industry progresses towards the Internet of Things (IoT) and Cyber-Physical Systems (CPS), current methods of historical data analytics face difficulties in addressing the new challenges which follow Industry 4.0. Industry 4.0 and IoT technologies facilitate the acquisition of ubiquitous data from machine tools and processes. However, these technologies also lead to the generation of a large number of data that are complex to be analyzed. Due to the streaming nature of the IoT systems, however, stream analytics could be used to extract features as the data are generated and published, which can prevent the need to store the data and perform advanced analytics that require high performance computing. This manuscript aims at demonstrating how traditional historical methods can be modified to be used as stream analytics tools for IoT data streams. Since data analytics is a wide domain, this paper has only focused on the two light-weighted methods that have been popular in the industry: Statistical Process Control Chart (SPCC), and Bayesian classification. This paper has defined, tested, and evaluated the accuracy and latency of the novel variation of these methods. It is concluded that by modifying the traditional methods and defining incremental solutions, methods such as Real-Time Dynamic Statistical Process Control Chart (RTDSPCC) and Incremental Gaussian Naïve Bayes (IGNB) can be formed that are highly beneficial for IoT applications as they are highly scalable, require minimal storage, and can update the models in real-time.

[1]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[2]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[3]  B. Rosner Percentage Points for a Generalized ESD Many-Outlier Procedure , 1983 .

[4]  Guoqiang Peter Zhang,et al.  Time series forecasting using a hybrid ARIMA and neural network model , 2003, Neurocomputing.

[5]  Bernard Rosner,et al.  On the Detection of Many Outliers , 1975 .

[6]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[7]  Ruey-Shiang Guh,et al.  Integrating artificial intelligence into on‐line statistical process control , 2003 .

[8]  Gautam Biswas,et al.  Data Mining for Anomaly Detection , 2013 .

[9]  Arun Kejariwal,et al.  Automatic Anomaly Detection in the Cloud Via Statistical Learning , 2017, ArXiv.

[10]  D. Noskievi Ová STATISTICAL ANALYSIS OF THE BLAST FURNACE PROCESS OUTPUT PARAMETER USING ARIMA CONTROL CHART WITH PROPOSED METHODOLOGY OF CONTROL LIMITS SETTING , 2009 .

[11]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[12]  Ana Bianco,et al.  Outlier Detection in Regression Models with ARIMA Errors Using Robust Estimates , 2001 .

[13]  Miriam A. M. Capretz,et al.  Contextual anomaly detection framework for big sensor data , 2015, Journal of Big Data.

[14]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[15]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.