Implementation of Network Cards Optimizations in Hadoop Cluster Data Transmissions

In this paper, the previously invented new methods of network card optimization are applied in a Hadoop cluster, where data transfers occur from the Master to the slave node. The slave node's network card setting is optimized subjective to the characteristics of the incoming data transmissions, which are indicated by the overall transmission size and packet size. The throughput comparisons between the optimized network card settings and the default setting conclude that the optimized versions always generate higher throughputs. Synchronously, the optimized settings also minimize CPU cycles utilization as they deploy timer-based polling (passive wait mode), in order to process the received data packets. This novel practice within Hadoop cluster may be replicated by other data cluster vendors, thus improving their data transfer's throughput and efficiency.

[1]  Xian-He Sun,et al.  IC-Data: Improving Compressed Data Processing in Hadoop , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).

[2]  Jing Zhang,et al.  Analysis for REPERA: A Hybrid Data Protection Mechanism in Distributed Environment , 2012, Int. J. Cloud Appl. Comput..

[3]  Michael D. Ernst,et al.  The HaLoop approach to large-scale iterative data analysis , 2012, The VLDB Journal.

[4]  Zhe Cui,et al.  Improved HDFS scheme based on erasure code and dynamical-replication system: Improved HDFS scheme based on erasure code and dynamical-replication system , 2013 .

[5]  Rohiza Ahmad,et al.  Big data: Performance profiling of Meteorological and Oceanographic data on Hive , 2016, 2016 3rd International Conference on Computer and Information Sciences (ICCOINS).

[6]  D. Anuradha,et al.  A Detailed Review on the Prominent Compression Methods Used for Reducing the Data Volume of Big Data , 2016 .

[7]  Yanfeng Zhang,et al.  i2MapReduce: incremental iterative MapReduce , 2013, Cloud-I '13.

[8]  Mohd Fadzil Hassan,et al.  Mathematical models for network card simulation and their empirical validations , 2015, 2015 International Symposium on Mathematical Sciences and Computing Research (iSMSC).

[9]  Da-Fang Zhang,et al.  Benefit of Compression in Hadoop: A Case Study of Improving IO Performance on Hadoop , 2016 .

[10]  Wen-Guey Tzeng,et al.  Toward Data Confidentiality via Integrating Hybrid Encryption Schemes and Hadoop Distributed File System , 2012, 2012 IEEE 26th International Conference on Advanced Information Networking and Applications.

[11]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[12]  Hongwei Sun,et al.  A Data Distribution Aware Task Scheduling Strategy for MapReduce System , 2009, CloudCom.

[13]  Tamer Elsayed,et al.  iHadoop: Asynchronous Iterations for MapReduce , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[14]  Ali Usman Abdullahi,et al.  Proposed adaptive indexing for Hive , 2015, 2015 International Symposium on Mathematical Sciences and Computing Research (iSMSC).

[15]  Mohd Fadzil Hassan,et al.  Genetic Algorithm Optimized Network in Cloud Data Centre , 2016 .

[16]  Du Zhi-hui Load Balancing Strategy on Periodical MapReduce Job , 2013 .

[17]  Nordin Zakaria,et al.  Chaos-Based Simultaneous Compression and Encryption for Hadoop , 2017, PloS one.

[18]  Youngseok Lee,et al.  Secure Hadoop with Encrypted HDFS , 2013, GPC.

[19]  Yanfeng Zhang,et al.  iMapReduce: A Distributed Computing Framework for Iterative Computation , 2011, IPDPS Workshops.

[20]  Xin Yang,et al.  SAPSC: Security Architecture of Private Storage Cloud Based on HDFS , 2012, 2012 26th International Conference on Advanced Information Networking and Applications Workshops.