Improving the Network Performance of a Container-Based Cloud Environment for Hadoop Systems

Cloud computing has emerged as an important paradigm to improve resource utilization, efficiency, flexibility, and the pay-per-use billing structure. However, cloud platforms cause performance degradations due to their virtualization layer and may not be appropriate for the requirements of high-performance applications, such as big data. This paper tackles the problem of improving network performance in container-based cloud instances to create a viable alternative to run network intensive Hadoop applications. Our approach consists of deploying link aggregation via the IEEE 802.3ad standard to increase the available bandwidth and using LXC (Linux Container) cloud instances to create a Hadoop cluster. In order to evaluate the efficiency of our approach and the overhead added by the container-based cloud environment, we ran a set of experiments to measure throughput, latency, bandwidth utilization, and completion times. The results prove that our approach adds minimal overhead in cloud environment as well as increases throughput and reduces latency. Moreover, our approach demonstrates a suitable alternative for running Hadoop applications, reducing completion times up to 33.73%.

[1]  Paul M. Carpenter,et al.  Controlling Network Latency in Mixed Hadoop Clusters: Do We Need Active Queue Management? , 2016, 2016 IEEE 41st Conference on Local Computer Networks (LCN).

[2]  Mark Handley,et al.  TCP Extensions for Multipath Operation with Multiple Addresses , 2020, RFC.

[3]  Geoffrey C. Fox,et al.  High Performance Parallel Computing with Clouds and Cloud Technologies , 2009, CloudComp.

[4]  Adriano Vogel,et al.  An Intra-Cloud Networking Performance Evaluation on CloudStack Environment , 2017, 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[5]  Odej Kao,et al.  Network-aware resource management for scalable data analytics frameworks , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[6]  Christof Fetzer,et al.  EHadoop: Network I/O Aware Scheduler for Elastic MapReduce Cluster , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[7]  César A. F. De Rose,et al.  A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters , 2014, PDP.

[8]  Douglas Eadline Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem , 2015 .

[9]  Nei Kato,et al.  Effective Delay-Controlled Load Distribution over Multipath Networks , 2011, IEEE Transactions on Parallel and Distributed Systems.

[10]  Khuzaima Daudjee,et al.  V-Hadoop: Virtualized Hadoop using containers , 2016, 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA).

[11]  Abhijeet Desai,et al.  Advanced Control Distributed Processing Architecture (ACDPA) using SDN and Hadoop for identifying the flow characteristics and setting the quality of service(QoS) in the network , 2015, 2015 IEEE International Advance Computing Conference (IACC).

[12]  Adriano Vogel,et al.  Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[13]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[14]  Osvaldo Gervasi,et al.  Advanced Computer Science and Information Technology , 2010 .

[15]  Andrian Rakhmatsyah,et al.  Performance analysis of container-based hadoop cluster: OpenVZ and LXC , 2016, 2016 4th International Conference on Information and Communication Technology (ICoICT).

[16]  Kostas Katrinis,et al.  Pythia: Faster Big Data in Motion through Predictive Software-Defined Network Optimization at Runtime , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[17]  Gabriele Mencagli,et al.  A Cooperative Predictive Control Approach to Improve the Reconfiguration Stability of Adaptive Distributed Parallel Applications , 2014, TAAS.

[18]  Benxiong Huang,et al.  Bandwidth-Aware Scheduling With SDN in Hadoop: A New Trend for Big Data , 2017, IEEE Systems Journal.

[19]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.

[20]  Ray-I Chang,et al.  Coupling GPU and MPTCP to improve Hadoop/MapReduce performance , 2016, 2016 2nd International Conference on Intelligent Green Building and Smart Grid (IGBSG).