Adaptive Normalization in Streaming Data

In today's digital era, data are everywhere from Internet of Things to health care or financial applications. This leads to potentially unbounded ever-growing Big data streams and it needs to be utilized effectively. Data normalization is an important preprocessing technique for data analytics. It helps prevent mismodeling and reduce the complexity inherent in the data especially for data integrated from multiple sources and contexts. Normalization of Big Data stream is challenging because of evolving inconsistencies, time and memory constraints, and non-availability of whole data beforehand. This paper proposes a distributed approach to adaptive normalization for Big data stream. Using sliding windows of fixed size, it provides a simple mechanism to adapt the statistics for normalizing changing data in each window. Implemented on Apache Storm, a distributed real-time stream data framework, our approach exploits distributed data processing for efficient normalization. Unlike other existing adaptive approaches that normalize data for a specific use (e.g., classification), ours does not. Moreover, our adaptive mechanism allows flexible controls, via user-specified thresholds, for normalization tradeoffs between time and precision. The paper illustrates our proposed approach along with a few other techniques and experiments on both synthesized and real-world data. The normalized data obtained from our proposed approach, on 160,000 instances of data stream, improves over the baseline by 89% with 0.0041 root-mean-square error compared with the actual data.

[1]  Otto Carlos Muniz Bandeira Duarte,et al.  A fast unsupervised preprocessing method for network monitoring , 2018, Annals of Telecommunications.

[2]  Xiao-Feng Gu,et al.  An improving online accuracy updated ensemble method in learning from evolving data streams , 2014, 2014 11th International Computer Conference on Wavelet Actiev Media Technology and Information Processing(ICCWAMTIP).

[3]  Alexandros Iosifidis,et al.  Deep Adaptive Input Normalization for Price Forecasting using Limit Order Book Data , 2019, ArXiv.

[4]  Francisco Herrera,et al.  Big data preprocessing: methods and prospects , 2016 .

[5]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[6]  Dr. Hui Xiong Association Analysis: Basic Concepts and Algorithms , 2005 .

[7]  Francisco Herrera,et al.  A survey on data preprocessing for data stream mining: Current status and future directions , 2017, Neurocomputing.

[8]  Eamonn J. Keogh,et al.  FINDING OR NOT FINDING RULES IN TIME SERIES , 2004 .

[9]  Francisco Herrera,et al.  Tutorial on practical tips of the most influential data preprocessing algorithms in data mining , 2016, Knowl. Based Syst..

[10]  Mehmed M. Kantardzic,et al.  Smart Preprocessing Improves Data Stream Mining , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[11]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[12]  Latifur Khan,et al.  Incremental Ensemble Classifier Addressing Non-stationary Fast Data Streams , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[13]  Marta Mattoso,et al.  Adaptive Normalization: A novel data normalization approach for non-stationary time series , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[14]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[15]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[16]  Bogdan Gabrys,et al.  Adaptive Preprocessing for Streaming Data , 2014, IEEE Transactions on Knowledge and Data Engineering.