A Parallel Host Log Analysis Approach Based on Spark

Intrusion detection plays a key role in maintaining the security of computer networks. Host-based intrusion detection systems usually analyze log data to discover host abnormal behavior. In recent years, with the rapid growth of massive host log data generated by virtual machines in the cloud environment, the traditional log analysis methods are limited by factors such as single data source, independent data, large data volume, and insufficient single-point computing capability. To solve this problem, this paper proposes a Spark-based host log data processing method, which first expands the data dimension based on Spark SQL to obtain more detailed dimensional data; then accomplish the query (especially union query) and counting complex data for more comprehensive host health used Spark SQL. Series of experiments result show that our proposed method can achieve platform scalability and has well time performance in log data processing.