Impact of Link Failures on the Performance of MapReduce in Data Center Networks

In this paper, we utilize Mixed Integer Linear Programming (MILP) models to determine the impact of link failures on the performance of shuffling operations in MapReduce when different data center network (DCN) topologies are used. For a set of non-fatal single and multi-links failures, the results indicate that different DCNs experience different completion time degradations ranging between 5% and 40%. The best performance under links failures is achieved by a server-centric PON-based DCN.

[1]  Jaafar M. H. Elmirghani,et al.  Routing post-disaster traffic floods in optical core networks , 2016, 2016 International Conference on Optical Network Design and Modeling (ONDM).

[2]  Jaafar M. H. Elmirghani,et al.  On the energy efficiency of MapReduce shuffling operations in data centers , 2017, 2017 19th International Conference on Transparent Optical Networks (ICTON).

[3]  Maurizio Ferrari,et al.  2016 18th International Conference on Transparent Optical Networks (ICTON) , 2016, ICTON.

[4]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[5]  Jaafar M. H. Elmirghani,et al.  Energy-efficient software-defined AWGR-based PON data center network , 2016, 2016 18th International Conference on Transparent Optical Networks (ICTON).

[6]  M. Tornatore,et al.  Design of Disaster-Resilient Optical Datacenter Networks , 2012, Journal of Lightwave Technology.

[7]  Jaafar M. H. Elmirghani,et al.  Routing post-disaster traffic floods heuristics , 2016, 2016 18th International Conference on Transparent Optical Networks (ICTON).

[8]  Amin Vahdat,et al.  TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System , 2013, TOCS.

[9]  David A. Maltz,et al.  Surviving failures in bandwidth-constrained datacenters , 2012, CCRV.

[10]  Jaafar M. H. Elmirghani,et al.  Energy Efficiency of Server-Centric PON Data Center Architecture for Fog Computing , 2018, 2018 20th International Conference on Transparent Optical Networks (ICTON).

[11]  Jaafar M. H. Elmirghani,et al.  Core network physical topology design for energy efficiency and resilience , 2013, 2013 15th International Conference on Transparent Optical Networks (ICTON).

[12]  Taisir E. H. El-Gorashi,et al.  Server-centric PON data center architecture , 2016, 2016 18th International Conference on Transparent Optical Networks (ICTON).

[13]  Rahul Potharaju,et al.  When the network crumbles: an empirical study of cloud network failures and their impact on services , 2013, SoCC.

[14]  Navendu Jain,et al.  An empirical analysis of intra- and inter-datacenter network failures for geo-distributed services , 2013, SIGMETRICS '13.

[15]  Jaafar M. H. Elmirghani,et al.  High performance AWGR PONs in data centre networks , 2015, 2015 17th International Conference on Transparent Optical Networks (ICTON).

[16]  Taisir Elgorashi,et al.  Energy efficient survivable IP-over-WDM networks with network coding , 2017, IEEE/OSA Journal of Optical Communications and Networking.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Biswanath Mukherjee,et al.  A Survey on Resiliency Techniques in Cloud Computing Infrastructures and Applications , 2016, IEEE Communications Surveys & Tutorials.

[19]  Jaafar M.H. Elmirghani,et al.  PON data centre design with AWGR and server based routing , 2017, 2017 19th International Conference on Transparent Optical Networks (ICTON).

[20]  Taisir E. H. El-Gorashi,et al.  Resource provisioning for cloud PON AWGR-based data center architecture , 2016, 2016 21st European Conference on Networks and Optical Communications (NOC).