Abstract In today’s computing era, the voluminous data that is generated every moment needs special tools and techniques for its effective and efficient handling and storage. In this paper, a technique for efficiently storing small size files in Hadoop distributed file system has been proposed. The proposal works by filtering the incoming files on the basis of two parameters- “file-type” (text, pdf, document, binary etc) and “file-size” (the amount of storage space required by the file). In order to secure the contents of the files we also propose to encrypt the files using Twofish cryptographic technique. This filtration and encryption is carried out before the files are passed onto the Hadoop distributed file system. For efficient storage of file, the small files are merged together into a single unit. The basic criteria for merging small size files here is the “dynamic merging techniques” with respect to the type of file instead of a generalized merging strategy. Furthermore, for efficient routing of files from source to destination and vice-versa, the concept of Software Defined Networking (SDN) has been adopted in the proposal. The empirical results shows that the proposed architecture is helpful in saving the Namenode memory overhead as well as reducing the disk seek time to a greater extent.
[1]
Qinghua Zheng,et al.
An optimized approach for storing and accessing small files on cloud storage
,
2012,
J. Netw. Comput. Appl..
[2]
Wenjie Liu,et al.
Optimized Data Replication for Small Files in Cloud Storage Systems
,
2016
.
[3]
Hao Wu,et al.
Enhancing throughput of the Hadoop Distributed File System for interaction-intensive tasks
,
2014,
J. Parallel Distributed Comput..
[4]
Jie Wang,et al.
Handling big data of online social networks on a small machine
,
2015
.
[5]
Yongfeng Huang,et al.
Hmfs: Efficient Support of Small Files Processing over HDFS
,
2014,
ICA3PP.
[6]
Michael Menth,et al.
Software-Defined Networking Using OpenFlow: Protocols, Applications and Architectural Design Choices
,
2014,
Future Internet.