Balanced Energy-Aware and Fault-Tolerant Data Center Scheduling

Fault tolerance, performance, and throughput have been major areas of research and development since the evolution of large-scale networks. Internet-based applications are rapidly growing, including large-scale computations, search engines, high-definition video streaming, e-commerce, and video on demand. In recent years, energy efficiency and fault tolerance have gained significant importance in data center networks and various studies directed the attention towards green computing. Data centers consume a huge amount of energy and various architectures and techniques have been proposed to improve the energy efficiency of data centers. However, there is a tradeoff between energy efficiency and fault tolerance. The objective of this study is to highlight a better tradeoff between the two extremes: (a) high energy efficiency and (b) ensuring high availability through fault tolerance and redundancy. The main objective of the proposed Energy-Aware Fault-Tolerant (EAFT) approach is to keep one level of redundancy for fault tolerance while scheduling resources for energy efficiency. The resultant energy-efficient data center network provides availability as well as fault tolerance at reduced operating cost. The main contributions of this article are: (a) we propose an Energy-Aware Fault-Tolerant (EAFT) data center network scheduler; (b) we compare EAFT with energy efficient resource scheduling techniques to provide analysis of parameters such as, workload distribution, average task per servers, and energy consumption; and (c) we highlight effects of energy efficiency techniques on the network performance of the data center.

[1]  Fa Zhang,et al.  PEFS: AI-Driven Prediction Based Energy-Aware Fault-Tolerant Scheduling Scheme for Cloud Data Center , 2021, IEEE Transactions on Sustainable Computing.

[2]  Junaid Shuja,et al.  Sensor Cloud Frameworks: State-of-the-Art, Taxonomy, and Research Issues , 2021, IEEE Sensors Journal.

[3]  MengChu Zhou,et al.  Revenue and Energy Cost-Optimized Biobjective Task Scheduling for Green Cloud Data Centers , 2021, IEEE Transactions on Automation Science and Engineering.

[4]  Magda B. Fayek,et al.  Proactive load balancing fault tolerance algorithm in cloud computing , 2021, Concurr. Comput. Pract. Exp..

[5]  Parmeet Kaur,et al.  Checkpointing Algorithms for Fault-Tolerant Execution of Large-Scale Distributed Applications in Cloud , 2020, Wireless Personal Communications.

[6]  Reihaneh Khorsand,et al.  An adaptive fault detector strategy for scientific workflow scheduling based on improved differential evolution algorithm in cloud , 2020, Appl. Soft Comput..

[7]  Nima Jafari Navimipour,et al.  Comprehensive and Systematic Study on the Fault Tolerance Architectures in Cloud Computing , 2020, J. Circuits Syst. Comput..

[8]  Eric Masanet,et al.  Statistical analysis for predicting location-specific data center PUE and its improvement potential , 2020, Energy.

[9]  Amir Masoud Rahmani,et al.  A survey study on virtual machine migration and server consolidation techniques in DVFS-enabled cloud datacenter: Taxonomy and challenges , 2020, J. King Saud Univ. Comput. Inf. Sci..

[10]  Reza Entezari-Maleki,et al.  Analytical evaluation of resource allocation algorithms and process migration methods in virtualized systems , 2020, Sustain. Comput. Informatics Syst..

[11]  Junaid Shuja,et al.  Characterizing Dynamic Load Balancing in Cloud Environments Using Virtual Machine Deployment Models , 2019, IEEE Access.

[12]  Junaid Shuja,et al.  SLA-Aware Best Fit Decreasing Techniques for Workload Consolidation in Clouds , 2019, IEEE Access.

[13]  Liang Luo,et al.  Improving Failure Tolerance in Large-Scale Cloud Computing Systems , 2019, IEEE Transactions on Reliability.

[14]  Xiaoyong Tang,et al.  Energy efficient job scheduling with workload prediction on cloud data center , 2018, Cluster Computing.

[15]  Albert Y. Zomaya,et al.  Greening emerging IT technologies: techniques and practices , 2017, Journal of Internet Services and Applications.

[16]  S. A. Haider,et al.  Fault-tolerance analyzer: A middle layer for pre-provision testing in OpenStack , 2017, Comput. Electr. Eng..

[17]  Hong Zhang,et al.  Resilient Datacenter Load Balancing in the Wild , 2017, SIGCOMM.

[18]  Robert Beverly,et al.  The Impact of Router Outages on the AS-level Internet , 2017, SIGCOMM.

[19]  Erhan Kozan,et al.  Profile-based application assignment for greener and more energy-efficient data centers , 2017, Future Gener. Comput. Syst..

[20]  Shafii Muhammad Abdulhamid,et al.  Fault tolerance aware scheduling technique for cloud computing environment using dynamic clustering algorithm , 2016, Neural Computing and Applications.

[21]  Rajiv Ranjan,et al.  Survey of Techniques and Architectures for Designing Energy-Efficient Data Centers , 2016, IEEE Systems Journal.

[22]  Nguyen Thanh Chung,et al.  Protected Elastic-tree topology for Data Center , 2015, SoICT.

[23]  Junaid Shuja,et al.  Data center energy efficient resource scheduling , 2014, Cluster Computing.

[24]  Miguel Elias M. Campista,et al.  A reliability analysis of datacenter topologies , 2012, 2012 IEEE Global Communications Conference (GLOBECOM).

[25]  Dong Lin,et al.  A study of fault-tolerance characteristics of data center networks , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012).

[26]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[27]  Dzmitry Kliazovich,et al.  DENS: data center energy-efficient network-aware scheduling , 2010, Cluster Computing.

[28]  Dzmitry Kliazovich,et al.  GreenCloud: a packet-level simulator of energy-aware cloud computing data centers , 2010, The Journal of Supercomputing.

[29]  Richard E. Brown,et al.  Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431 , 2008 .

[30]  Liqun Fu,et al.  Optimizing job completion time with fairness in large-scale data centers , 2021, Future Gener. Comput. Syst..

[31]  T. Edsall,et al.  Let It Flow: Resilient Asymmetric Load Balancing with Flowlet Switching , 2017, NSDI.