Bigdata logs analysis based on seq2seq networks for cognitive Internet of Things

Abstract While bigdata system processes high-volume data at high speed, it also generates a large amount of logs. However, it is hard for people to predict future events based on massive, multi-source, heterogeneous bigdata logs. This paper proposes a comprehensive method for smart computation and prediction of massive logs in the internet of things (IoT). Traditional machine learning, Hidden Markov Model (HMM) and Autoregressive Integrated Moving Average Model (ARIMA) methods are not accurate enough to predict time series based data over time. In this work we first elaborate the distributed collection and storage, event location, and vectorized representations of bigdata logs. Next, we present a log fusion algorithm to convert the logs (unstructured text data) of each component of bigdata into structured data by removing noise, adding timestamps and classification labels. Then, we introduce a predictive model for bigdata system. We use an attention mechanism to improve sequence to sequence (seq2seq) algorithm and add an adjustor to globally fit the data distribution. Our experimental results show that the neural network model trained by our method has a good performance with the real-world data. Compared with the previous predictive method, the root mean square error (RMSE) is reduced by 46.65% and the R-squared (R2) fitting degree is improved by 14.28%.

[1]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[2]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[3]  Shahrin Sahib,et al.  Intrusion Alert Correlation Technique Analysis for Heterogeneous Log , 2008 .

[4]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[5]  Daniel P. Siewiorek,et al.  Error log analysis: statistical modeling and heuristic trend analysis , 1990 .

[6]  Wan Ahmad Tajuddin Wan Abdullah,et al.  Logic Learning in Hopfield Networks , 2008, ArXiv.

[7]  Paolo Maresca,et al.  The role of big data and cognitive computing in the learning process , 2017, J. Vis. Lang. Comput..

[8]  Michael W. Godfrey,et al.  Mining modern repositories with elasticsearch , 2014, MSR 2014.

[9]  Dharmendra S. Modha,et al.  Cognitive Computing , 2011, Informatik-Spektrum.

[10]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[11]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Abdul Azim Abd Ghani,et al.  Filtering events using clustering in heterogeneous security logs , 2011 .

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[17]  István Vajk,et al.  Frequent Pattern Mining in Web Log Data , 2006 .

[18]  Shilin He,et al.  Towards Automated Log Parsing for Large-Scale Log Data Analysis , 2018, IEEE Transactions on Dependable and Secure Computing.

[19]  Amit P. Sheth,et al.  Internet of Things to Smart IoT Through Semantic, Cognitive, and Perceptual Computing , 2016, IEEE Intelligent Systems.

[20]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[21]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.