Knowledge Discovery from Big Data for Intrusion Detection Using LDA

This paper explores a hybrid approach of intrusion detection through knowledge discovery from big data using Latent Dirichlet Allocation (LDA). We identify the "hidden" patterns of operations conducted by both normal users and malicious users from a large volume of network/systems logs, by mapping this problem to the topic modeling problem and leveraging the well established LDA models and learning algorithms. This new approach potentially completes the strength of signature-based and anomaly-based methods.

[1]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[2]  Domenico Cotroneo,et al.  Identifying Compromised Users in Shared Computing Infrastructures: A Data-Driven Bayesian Network Approach , 2011, 2011 IEEE 30th International Symposium on Reliable Distributed Systems.

[3]  Ravishankar K. Iyer,et al.  Analysis of security data from a large computing organization , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[4]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..