A multi-parameter scheduling method of dynamic workloads for big data calculation in cloud computing

Workload scheduling in cloud computing is currently an active research field. Scheduling plays an important role in cloud computing performance, especially when the platform is used for big data analysis and as less predictable workloads dynamically enter the clouds. Finding the optimized scheduling solution with different parameters in different environments is still a challenging issue. In dynamic environments such as cloud, scheduling strategies should feature rapid altering to be able to adapt more easily to the changes in input workloads. However, achieving an optimized solution is an important issue, which has a trade-off with the speed of finding the solution. In this article, an ordinal optimization method is proposed that considers the volume of workloads, load balancing and the volume of exchanged messages among virtual clusters, considering the replications. The algorithm in the present paper is based on ordinal optimization (OO) and evolutionary OO. In any time periods, a criterion is calculated to determine the similarity of workloads in two-consequence time periods, which is appropriate for timely changes in the scheduling procedure. In this paper, considering more than one parameter, a proper scheduling would be created for each time period. This scheduler is an organization for the number of virtual machines for each virtual cluster, but if there is a desirable similarity between workloads of two-consequence time periods, this procedure would be ignored. The results show that a more optimized solution is obtained in comparison with the rated methods, such as blind pink, OO, Monte Carlo and eOO in a reasonable time. The suggested method is flexible and it is possible to change the weight ratio of the proposed criteria in different environments to be consistent with different environmental conditions. The results show that proposed method achieved up to 28% performance improvement in comparison with eOO.

[1]  Zhenlong Li,et al.  Big Data and cloud computing: innovation opportunities and challenges , 2017, Int. J. Digit. Earth.

[2]  Rajkumar Buyya,et al.  Virtual Machine Customization and Task Mapping Architecture for Efficient Allocation of Cloud Data Center Resources , 2016, Comput. J..

[3]  Fatma A. Omara,et al.  Genetic algorithms for task scheduling problem , 2010, J. Parallel Distributed Comput..

[4]  Keqin Li,et al.  Future Generation Computer Systems ( ) – Future Generation Computer Systems Multi-objective Scheduling of Many Tasks in Cloud Platforms , 2022 .

[5]  Rajeev Barua,et al.  Implementation and performance evaluation of a distributed conjugate gradient method in a cloud computing environment , 2013, Softw. Pract. Exp..

[6]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[7]  Gholamhossein Dastghaibyfard,et al.  Combination of data replication and scheduling algorithm for improving data availability in Data Grids , 2013, J. Netw. Comput. Appl..

[8]  Amir Masoud Rahmani,et al.  RFOH: A New Fault Tolerant Job Scheduler in Grid Computing , 2010, 2010 Second International Conference on Computer Engineering and Applications.

[9]  Xiaohui Wei,et al.  An Enhanced Data-aware Scheduling Algorithm for Batch-mode Dataintensive Jobs on Data Grid , 2006, 2006 International Conference on Hybrid Information Technology.

[10]  Mohamed El-Darieby,et al.  Scheduling big data applications within advance reservation framework in optical grids , 2016, Appl. Soft Comput..

[11]  Florian Stahl,et al.  Marketplaces for data: an initial survey , 2013, SGMD.

[12]  A.M. Rahmani,et al.  A Modified Simulated Annealing Algorithm for Static Task Scheduling in Grid Computing , 2008, 2008 International Conference on Computer Science and Information Technology.

[13]  Ahmad Habibizad Navin,et al.  Job scheduling in the Expert Cloud based on genetic algorithms , 2014, Kybernetes.

[14]  Naixue Xiong,et al.  A Pretreatment Workflow Scheduling Approach for Big Data Applications in Multicloud Environments , 2016, IEEE Transactions on Network and Service Management.

[15]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[16]  Michael Lang,et al.  Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales , 2016, Concurr. Comput. Pract. Exp..

[17]  Meikang Qiu,et al.  Online optimization for scheduling preemptable tasks on IaaS cloud systems , 2012, J. Parallel Distributed Comput..

[18]  Keke Gai,et al.  Security-Aware Efficient Mass Distributed Storage Approach for Cloud Systems in Big Data , 2016, 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS).

[19]  Jang-Won Lee,et al.  Multi-Residential Demand Response Scheduling With Multi-Class Appliances in Smart Grid , 2018, IEEE Transactions on Smart Grid.

[20]  Alexander S. Szalay,et al.  JAWS: Job-Aware Workload Scheduling for the Exploration of Turbulence Simulations , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Keqin Li,et al.  Adaptive Workflow Scheduling on Cloud Computing Platforms with IterativeOrdinal Optimization , 2015, IEEE Transactions on Cloud Computing.

[22]  Ghalem Belalem,et al.  Optimization of Tasks Scheduling by an Efficacy Data Placement and Replication in Cloud Computing , 2013, ICA3PP.

[23]  Vive Kumar,et al.  Swarm Intelligence (SI) based profiling and scheduling of big data applications , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[24]  Vasudeva Varma,et al.  Job Aware Scheduling Algorithm for MapReduce Framework , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[25]  Hailin Zhang,et al.  Genetic Algorithm-based Study on Flow Allocation in a Multicommodity Stochastic-flow Network with Unreliable Nodes , 2007 .

[26]  Y. Ho,et al.  Ordinal Optimization: Soft Optimization for Hard Problems , 2007 .

[27]  Amir Masoud Rahmani,et al.  Data Replication-Based Scheduling in Cloud Computing Environment , 2017 .

[28]  Jinjun Chen,et al.  HKE‐BC: hierarchical key exchange for secure scheduling and auditing of big data in cloud computing , 2016, Concurr. Comput. Pract. Exp..

[29]  Rajkumar Buyya,et al.  Big Data computing and clouds: Trends and future directions , 2013, J. Parallel Distributed Comput..

[30]  Albert Y. Zomaya,et al.  Author manuscript, published in "Journal of Parallel and Distributed Computing (2011)" A Parallel Bi-objective Hybrid Metaheuristic for Energy-aware Scheduling for Cloud Computing Systems , 2011 .

[31]  Kenli Li,et al.  A resource-aware scheduling algorithm with reduced task duplication on heterogeneous computing systems , 2014, The Journal of Supercomputing.

[32]  Amir Masoud Rahmani,et al.  Solving the scheduling problem in multi-processor systems with communication cost and precedence using bee colony system , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).

[33]  Albert Y. Zomaya,et al.  Evolutionary Scheduling of Dynamic Multitasking Workloads for Big-Data Analytics in Elastic Cloud , 2014, IEEE Transactions on Emerging Topics in Computing.

[34]  Cheng Wang,et al.  Adaptive Replication Based Security Aware and Fault Tolerant Job Scheduling for Grids , 2007, Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007).