Network log analysis based on the topic word mover's distance

Telecommunication networks continuously generate various system logs which include plentiful information of system status. So these logs can be used to detect whether a network is under a fault scenario or not. In this paper, we propose an improved word mover's distance (WMD) called Topic Word Mover's Distance (T-WMD) to measure the distance between two log samples and then classify different fault logs to determine the fault root cause. Compared with original WMD, T-WMD takes topic information into consideration and provides more latent semantic information of log corpus. Experiments of k-nearest neighbor (k-nn) fault logs classification show that our T-WMD metric outperforms the original WMD.

[1]  Wei Xu,et al.  System Problem Detection by Mining Console Logs , 2010 .

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[5]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[6]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[7]  Matt J. Kusner,et al.  Supervised Word Mover's Distance , 2016, NIPS.

[8]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[9]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.