Big Data Analytics for Detecting Host Misbehavior in Large Logs

The management of complex network infrastructures continues to be a difficult endeavor today. These infrastructures can contain a huge number of devices that may misbehave in unpredictable ways. Many of these devices keep logs that contain valuable information about the infrastructures' security, reliability, and performance. However, extracting information from that data is far from trivial. The paper presents a novel approach to assess the security of such an infrastructure using its logs, inspired on data from a real telecommunications network. We use machine learning and data mining techniques to analyze the data and semi-automatically discover misbehaving hosts, without having to instruct the system about how hosts misbehave.

[1]  Mark Strembeck,et al.  A User Profile Derivation Approach based on Log-File Analysis , 2007, IKE.

[2]  Wei Wang,et al.  Using Large Scale Distributed Computing to Unveil Advanced Persistent Threats , 2012 .

[3]  Alvaro A. Cárdenas,et al.  Big Data Analytics for Security , 2013, IEEE Security & Privacy.

[4]  William K. Robertson,et al.  Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks , 2013, ACSAC.

[5]  Ian H. Witten,et al.  Data Mining: Practical Machine Learning Tools and Techniques, 3/E , 2014 .

[6]  U. Fayyad,et al.  Scaling EM (Expectation Maximization) Clustering to Large Databases , 1998 .

[7]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[8]  Navjot Singh,et al.  Log Analytics for Dependable Enterprise Telephony , 2012, 2012 Ninth European Dependable Computing Conference.

[9]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.

[10]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[11]  Jon Stearley,et al.  Bridging the Gaps: Joining Information Sources with Splunk , 2010, SLAML.

[12]  Michael W. Godfrey,et al.  Mining modern repositories with elasticsearch , 2014, MSR 2014.

[13]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[14]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Yakov Shafranovich,et al.  Common Format and MIME Type for Comma-Separated Values (CSV) Files , 2005, RFC.

[17]  Jianfeng Zhan,et al.  LogMaster: Mining Event Correlations in Logs of Large-Scale Cluster Systems , 2010, 2012 IEEE 31st Symposium on Reliable Distributed Systems.