Building Application-Aware Network Environments using SDN for Optimizing Hadoop Applications

Hadoop has become the de facto standard for Big Data analytics, especially for workloads that use the MapReduce (M/R) framework. However, the lack of network awareness of the default MapReduce resource manager in Hadoop can cause unbalanced job scheduling, network bottleneck, and eventually increase the Hadoop run time if Hadoop nodes are clustered in several geographically distributed locations. In this paper, we present an application-aware network approach using software-defined networking (SDN) for distributed Hadoop clusters. We develop the SDN applications for this environment that consider network topology discovery, traffic monitoring, and flow rerouting in addition to loop avoidance mechanisms.