Log files Analysis Using MapReduce to Improve Security

Abstract Log files are a very useful source of information to diagnose system security and to detect problems that occur in the system, and are often very large and can have complex structure. In this paper, we provide a methodology of security analysis that aims to apply Big Data techniques, such as MapReduce, over several system log files in order to locate and extract data probably related to attacks made by malicious users whose intends to compromise a system. These data will lead, through a process of learning, to identify, predict attacks or detect intrusions. We have clarified this approach with a concrete case study on exploiting access log files of web apache servers to predict and detect SQLI and DDOS attacks. The obtained results are promising, we are able to extract malicious indicators and events that characterize the intrusions, which help us to make an accurate diagnosis of the security and supervise state of the system, and subsequently in the learning process.