A Study on Security Improvement in Hadoop Distributed File System Based on Kerberos

As the developments of smart devices and social network services, the amount of data has been exploding. The world is facing Big data era. For these reasons, the Big data processing technology which is a new technology that can handle such data has attracted much attention. One of the most representative technologies is Hadoop. Hadoop Distributed File System(HDFS) designed to run on commercial Linux server is an open source framework and can store many terabytes of data. The initial version of Hadoop did not consider security because it only focused on efficient Big data processing. As the number of users rapidly increases, a lot of sensitive data including personal information were stored on HDFS. So Hadoop announced a new version that introduces Kerberos and token system in 2009. However, this system is vulnerable to the replay attack, impersonation attack and other attacks. In this paper, we analyze these vulnerabilities of HDFS security and propose a new protocol which complements these vulnerabilities and maintains the performance of Hadoop.

[1]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[2]  Alexey Melnikov,et al.  Simple Authentication and Security Layer (SASL) , 2006, RFC.

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Eric Sammer Hadoop Operations , 2012 .

[5]  GhemawatSanjay,et al.  The Google file system , 2003 .

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  John T. Kohl,et al.  The Kerberos Network Authentication Service (V5 , 2004 .