Virtual Knowledge Graphs for Federated Log Analysis

Security professionals rely extensively on log data to monitor IT infrastructures and investigate potentially malicious activities. Existing systems support these tasks by collecting log messages in a database, from where log events can be queried and correlated. Such centralized approaches are typically based on a relational model and store log messages as plain text, which offers limited flexibility for the representation of heterogeneous log events and the connections between them. A knowledge graph representation can overcome such limitations and enable graph pattern-based log analysis, leveraging semantic relationships between objects that appear in heterogeneous log streams. In this paper, we present a method to dynamically construct such log knowledge graphs at query time, i.e., without a priori parsing, aggregation, processing, and materialization of log data. Specifically, we propose a method that – for a given query formulated in SPARQL – dynamically constructs a virtual log knowledge graph directly from heterogeneous raw log files across multiple hosts and contextualizes the result with internal and external background knowledge. We evaluate the approach across multiple heterogeneous log sources and machines and see encouraging results that indicate that the approach is viable and facilitates ad-hoc graph-analytic queries in federated settings.

[1]  Kabul Kurniawan,et al.  Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach , 2020, SEC.

[2]  Trevor J. Bihl,et al.  Topological Data Analysis for Enhancing Embedded Analytics for Enterprise Cyber Log Analysis and Forensics , 2020, HICSS.

[3]  Axel Polleres,et al.  Binary RDF representation for publication and exchange (HDT) , 2013, J. Web Semant..

[4]  Claudio Soriente,et al.  In : Advances in Security Information Management : Perceptions and Outcomes , 2011 .

[5]  Kabul Kurniawan,et al.  The SEPSES Knowledge Graph: An Integrated Resource for Cybersecurity , 2019, SEMWEB.

[6]  Chris Phillips,et al.  Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management , 2012 .

[7]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.

[8]  Florian Skopik,et al.  Intrusion Detection in Distributed Systems using Fingerprinting and Massive Event Correlation , 2013, GI-Jahrestagung.

[9]  Danfeng Yao,et al.  Massive distributed and parallel log analysis for organizational security , 2013, 2013 IEEE Globecom Workshops (GC Wkshps).

[10]  Mahdi Aiash,et al.  Toward an Efficient Ontology-Based Event Correlation in SIEM , 2016, ANT/SEIT.

[11]  Christopher Krügel,et al.  Decentralized Event Correlation for Intrusion Detection , 2001, ICISC.

[12]  Diego Calvanese,et al.  OBDA for Log Extraction in Process Mining , 2017, Reasoning Web.

[13]  Christian Pape,et al.  RESTful Correlation and Consolidation of Distributed Logging Data in Cloud Environments , 2013, ICIW 2013.

[14]  Andreas Rauber,et al.  Have It Your Way: Generating Customized Log Data Sets with a Model-driven Simulation Testbed , 2020, 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS).

[15]  Igor V. Kotenko,et al.  Design and Implementation of a Hybrid Ontological-Relational Data Repository for SIEM Systems , 2013, Future Internet.

[16]  Julian Schütte,et al.  Model-Based Security Event Management , 2012, MMM-ACNS.

[17]  Diego Calvanese,et al.  Ontology-Based Data Access: A Survey , 2018, IJCAI.

[18]  Diego Calvanese,et al.  Virtual Knowledge Graphs: An Overview of Systems and Use Cases , 2019, Data Intelligence.

[19]  Robert F. Mills,et al.  Design and Analysis of a Dynamically Configured Log-based Distributed Security Event Detection Methodology , 2012 .

[20]  Andreas Ekelhart,et al.  Taming the logs - Vocabularies for semantic security analysis , 2018, SEMANTICS.

[21]  Ruben Verborgh,et al.  Comunica: A Modular SPARQL Query Engine for the Web , 2018, SEMWEB.