NS3 Based HDFS Data Placement Algorithm Evaluation Framework

Big data analytics based data exploration and utilization holds immense prospects for the future of businesses. However, as the name suggests, processing such a huge amount of data is challenging. Hadoop with its parallel processing solutions, assists in processing big data in reasonable time. The heart of Hadoop is its distributed File System; and indeed how data is placed in the file system dictates the speed of the data processing. Hence, over the years efficient data placement algorithms has been one of the key research area in big data analytics. Evaluation of such algorithms traditionally requires deploying HDFS on hardware clusters and implementing the data placement algorithm on it. It is often difficult for researchers to acquire required hardware and build a hardware clusters. Even when such clusters are available, scalability becomes an issue. Moreover, real life data center like cluster is not available to many researchers. Simulation provides low cost alternative to evaluation of big data placement algorithms on HDFS. One of the key metrices that is optimized in data placement algorithms is to minimize communication costs and latency. Thus a network simulation based simulation framework would fit the role perfectly. NS3 is one of the most prominent network simulation tool available for researchers. However, full HDFS support for data placement research is still not implemented. This work proposes to extend the NS3 simulation environment for HDFS support and eventual use for data placement algorithm evaluation.

[1]  Teerawat Issariyakul,et al.  Introduction to Network Simulator NS2 , 2008 .

[2]  Albert Y. Zomaya,et al.  Quantitative comparisons of the state‐of‐the‐art data center architectures , 2013, Concurr. Comput. Pract. Exp..

[3]  Maozhen Li,et al.  HSim: A MapReduce simulator in enabling Cloud Computing , 2013, Future Gener. Comput. Syst..

[4]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[5]  Maozhen Li,et al.  MRSim: A discrete event based MapReduce simulator , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[6]  Chuan Heng Foh,et al.  Towards reproducible performance studies of datacenter network architectures using an open-source simulation approach , 2013, 2013 IEEE Global Communications Conference (GLOBECOM).

[7]  Thomas R. Henderson,et al.  Network Simulations with the ns-3 Simulator , 2008 .

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Guanying Wang,et al.  A simulation approach to evaluating design decisions in MapReduce setups , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.