Haddle: A Framework for Investigating Data Leakage Attacks in Hadoop

Nowadays Hadoop is popular among businesses and individuals for its low costs, convenience, and fast speed. However, this also makes it the goal of data leakage attacks as sensitive data stored with an HDFS infrastructure grows rapidly. Therefore, it is important to investigate such attacks in Hadoop. Several works have been done on improving the security of Hadoop, but hardly any have been done on data leakage investigation. This paper presents a typical data leakage attack scene in Hadoop and proposes Haddle (Hadoop Data Leakage Explorer), a forensic framework composed of automatic analytical methods and on-demand data collection based on two stages. With the assistance of Haddle, investigators can find the stolen data, find the perpetrator who stole the data, and reconstruct the crime scene. Also, Haddle can help improve the audit mechanism of Hadoop.