A Proposal for High Availability of HDFS Architecture based on Threshold Limit and Saturation Limit of the Namenode

Big Data which is one of the newest technologies in the present field of science and technology has created an enormous drift of technology to a salient data architecture. The next thing that comes right after big data is Hadoop which has motivated the complete Big Data Environment to its jurisdiction and has reinforced the complete storage and analysis of big data. This paper discusses a hierarchical architecture of Hadoop Nodes namely Namenodes and Datanodes for maintaining a High Availability Hadoop Distributed File System. The High Availability Hadoop Distributed File System architecture establishes itself onto the two fundamental model of Hadoop that is Master-Slave Architecture and elimination of single point node failure. The architecture will be of such utilization that there will be an optimum load on the data nodes and moreover there will be no loss of any data in comparison to the size of data.

[1]  Robert B. Ross,et al.  On the duality of data-intensive file system design: Reconciling HDFS and PVFS , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[2]  Dhabaleswar K. Panda,et al.  Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[3]  David R. Karger,et al.  Analysis of the evolution of peer-to-peer systems , 2002, PODC '02.

[4]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[5]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[6]  Cees T. A. M. de Laat,et al.  Addressing big data issues in Scientific Data Infrastructure , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[7]  Gade Krishna,et al.  A scalable peer-to-peer lookup protocol for Internet applications , 2012 .

[8]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[9]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[10]  Nikos Tsikoudis,et al.  A General-Purpose Architecture for Replicated Metadata Services in Distributed File Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[11]  Cees T. A. M. de Laat,et al.  Defining architecture components of the Big Data Ecosystem , 2014, 2014 International Conference on Collaboration Technologies and Systems (CTS).

[12]  Muhammad Anshari,et al.  Big data: Concept, applications, & challenges , 2016, 2016 International Conference on Information Management and Technology (ICIMTech).

[13]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[14]  Dhabaleswar K. Panda,et al.  In-memory I/O and replication for HDFS with Memcached: Early experiences , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[15]  Siddharth Swarup Rautaray,et al.  A Survey Work on Optimization Techniques Utilizing Map Reduce Framework in Hadoop Cluster , 2017 .

[16]  Farag Azzedin Towards a scalable HDFS architecture , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[17]  Mayank Bhushan,et al.  Big data query optimization by using Locality Sensitive Bloom Filter , 2015, 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).

[18]  Xin Wang,et al.  Research of Distributed Data Store Based on HDFS , 2013, 2013 International Conference on Computational and Information Sciences.

[19]  Konstantin V. Shvachko,et al.  HDFS Scalability: The Limits to Growth , 2010, login Usenix Mag..

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Srikanth Kandula,et al.  PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.

[22]  Toshimitsu Masuzawa,et al.  A Distributed NameNode Cluster for a Highly-Available Hadoop Distributed File System , 2014, 2014 IEEE 33rd International Symposium on Reliable Distributed Systems.

[23]  Said Jai-Andaloussi,et al.  Toward a Big Data Architecture for Security Events Analytic , 2016, 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud).

[24]  Chung-Horng Lung,et al.  Performance optimization of big data in mobile networks , 2015, 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE).