Call Trace and Memory Access Pattern based Runtime Insider Threat Detection for Big Data Platforms

Big data platforms such as Hadoop and Spark are being widely adopted both by academia and industry. In this paper, we propose a runtime intrusion detection technique that understands and works according to the properties of such distributed compute platforms. The proposed method is based on runtime analysis of system and library calls and memory access patterns of tasks running on the datanodes (slaves). First, the primary datanode of a big data system creates a behavior profile for every task it executes. A behavior profile includes (a) trace of the system & library calls made, and (b) sequence representing the sizes of private and shared memory accesses made during task execution. Then, the process behavior profile is shared with other replica datanodes that are scheduled to execute the same task on their copy of the same data. Next, these replica datanodes verify their local tasks with the help of the information embedded in the received behavior profiles. This is realized in two steps: (i) comparing the system & library calls metadata, and (ii) statistical matching of the memory access patterns. Finally, datanodes share their observations for consensus and report an intrusion to the namenode (master) if they find any discrepancy. The proposed solution was tested on a small hadoop cluster using the default MapReduce examples and the results show that our approach can detect insider attacks that cannot be detected with the traditional analysis metrics.