论文信息 - A Parallel Host Log Analysis Approach Based on Spark

A Parallel Host Log Analysis Approach Based on Spark

Intrusion detection plays a key role in maintaining the security of computer networks. Host-based intrusion detection systems usually analyze log data to discover host abnormal behavior. In recent years, with the rapid growth of massive host log data generated by virtual machines in the cloud environment, the traditional log analysis methods are limited by factors such as single data source, independent data, large data volume, and insufficient single-point computing capability. To solve this problem, this paper proposes a Spark-based host log data processing method, which first expands the data dimension based on Spark SQL to obtain more detailed dimensional data; then accomplish the query (especially union query) and counting complex data for more comprehensive host health used Spark SQL. Series of experiments result show that our proposed method can achieve platform scalability and has well time performance in log data processing.

Yong Wang | Hao Feng | Xinpeng Li | Wenlong Ke

[1] Zhang Peng,et al. Big-Data Platform Based on Open Source Ecosystem , 2017 .

[2] Helen D. Karatza,et al. Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark , 2017, J. Syst. Softw..

[3] Todor Ivanov,et al. Performance Evaluation of Spark SQL Using BigBench , 2015, WBDB.

[4] Fatos Xhafa,et al. Performance Evaluation of a MapReduce Hadoop-Based Implementation for Processing Large Virtual Campus Log Files , 2015, 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC).