The performance evaluations for Hadoop in diverse system architectures

The utilities of data nowadays have been significantly improved due to discovery of latent data features from big data. However, one critical issue of discovering the knowledge in big data is people have to process large volumes of data but using very limited computing resources. In order to solve the problem, people developed a series of distributed computing technologies among which Hadoop framework is proved to be the most famous one due to its reliability and scalability. Hadoop framework highly relies on Linux operating system for functionalizing its components. However, the latest released versions of the framework only support 32bit Linux operating systems, which may less utilize the resources due to system limitation. Therefore, for enabling Hadoop framework in supporting 64bit operating system and observing its performances, this paper creates a 64 bit Hadoop framework and further evaluates the performances based on a standard benchmark algorithm wordcount in both 32bit and 64bit operating systems.

[1]  Maozhen Li,et al.  A MapReduce based distributed LSI , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[2]  Maozhen Li,et al.  MRSim: A discrete event based MapReduce simulator , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[3]  Kenneth I. Joy,et al.  Streamline Integration Using MPI-Hybrid Parallelism on a Large Multicore Architecture , 2011, IEEE Transactions on Visualization and Computer Graphics.

[4]  Maozhen Li,et al.  Parallelizing multiclass Support Vector Machines for scalable image annotation , 2011, FSKD.

[5]  Salim Hariri,et al.  The software architecture of a virtual distributed computing environment , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Andrew L. Wendelborn,et al.  Geostationary-satellite imagery applications on distributed, high-performance computing , 1997, Proceedings High Performance Computing on the Information Superhighway. HPC Asia '97.

[8]  Alan Wagner,et al.  FG-MPI: Fine-grain MPI for multicore and clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[9]  Maozhen Li,et al.  A MapReduce Based Distributed LSI for Scalable Information Retrieval , 2014, Comput. Informatics.

[10]  Maozhen Li,et al.  HSim: A MapReduce simulator in enabling Cloud Computing , 2013, Future Gener. Comput. Syst..

[11]  Takanori Yokoyama,et al.  An object-based distributed computing environment based on a reflective architecture , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[12]  John Paul Walters,et al.  Replication-Based Fault Tolerance for MPI Applications , 2009, IEEE Transactions on Parallel and Distributed Systems.