A Proposed Approach for Improving Hadoop Performance for Handling Small Files

As the world is getting digitized, the speed in which the amount of data is overflowing from different sources in different formats, and it is not possible for the traditional system to compute and analyze this kind of data called big data. To properly analyze and process big data, tool like Hadoop is used which is open source software. It stores and computes the data in a distributed environment. Big data is important as it plays a big part in making big benefits for today’s business It captures and analyzes the wealth of information of a company and quickly converts it into actionable insights. However, when it comes to storing and accessing of huge amount of small files, a bottleneck problem arises in the name node of Hadoop; so in this work, we propose a method to efficiently optimize the name node working by eradicating the bottleneck problem arising due to massive small files.

[1]  Xiaoshe Dong,et al.  Small files storing and computing optimization in Hadoop parallel rendering , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[2]  Jun Wang,et al.  Improving metadata management for small files in HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[3]  Siddharth Swarup Rautaray,et al.  A Survey Work on Optimization Techniques Utilizing Map Reduce Framework in Hadoop Cluster , 2017 .

[4]  Siddharth Swarup Rautaray,et al.  Name node performance enlarging by aggregator based HADOOP framework , 2017, 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC).

[5]  Tanvi Gupta,et al.  An extended HDFS with an AVATAR NODE to handle both small files and to eliminate single point of failure , 2015, 2015 International Conference on Soft Computing Techniques and Implementations (ICSCTI).

[6]  Pei Shu-Jun,et al.  Optimization and Research of Hadoop Platform Based on FIFO Scheduler , 2015, 2015 Seventh International Conference on Measuring Technology and Mechatronics Automation.

[7]  Siddharth Swarup Rautaray,et al.  Improvising name node performance by aggregator aided HADOOP framework , 2016, 2016 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT).

[8]  Qinghua Zheng,et al.  A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files , 2010, 2010 IEEE International Conference on Services Computing.

[9]  M. Frans Kaashoek,et al.  Embedded Inodes and Explicit Grouping: Exploiting Disk Bandwidth for Small Files , 1997, USENIX Annual Technical Conference.