BigTrustScheduling: Trust-aware big data task scheduling approach in cloud computing environments

Abstract Big data task scheduling in cloud computing environments has gained considerable attention in the past few years, due to the exponential growth in the number of businesses that are relying on cloud-based infrastructure as a backbone for big data storage and analytics. The main challenge in scheduling big data services in cloud-based environments is to guarantee minimal makespan while minimizing at the same time the amount of utilized resources. Several approaches have been proposed in an attempt to overcome this challenge. The main limitation of these approaches stems from the fact that they overlook the trust levels of the Virtual Machines (VMs), thus risking to endanger the overall Quality of Service (QoS) of the big data analytic process, which includes not only heartbeat frequency ratio and resource consumption, but also security challenges such as intrusion detection, access control, authentication, etc. To overcome this limitation, we propose in this work a trust-aware scheduling solution called BigTrustScheduling that consists of three stages: VMs’ trust level computation, tasks priority level determination, and trust-aware scheduling. Experiments conducted on a real Hadoop cluster environment using real-world datasets collected from the Google Cloud Platform pricing and Bitbrains task and resource requirements show that our solution minimizes the makespan by 59% compared to the Shortest Job First (SJF), by 48% compared to the Round Robin (RR), and by 40% compared to the improved Particle Swarm Optimization (PSO) approaches in the presence of untrusted VMs. Moreover, our solution decreases the monetary cost by 58% compared to the SJF, by 47% compared to the RR, and by 38% compared to the improved PSO in the presence of untrusted VMs. The results in this work can be applicable to other problems. This would be possible through tuning the corresponding metrics in the formulation of the problem and solution, as will as in the experimental environment. In fact, the trust model can be extended to other environments including cloud computing, IoT, parallel computing, etc.

[1]  MengChu Zhou,et al.  Dynamic Cloud Task Scheduling Based on a Two-Stage Strategy , 2018, IEEE Transactions on Automation Science and Engineering.

[2]  Shalini Ramanathan,et al.  Linear Scheduling Strategy for Resource Allocation in Cloud Environment , 2012, CloudCom 2012.

[3]  Xinran Li,et al.  Bayesian Aggregation of Rank Data with Covariates and Heterogeneous Rankers , 2016, 1607.06051.

[4]  Philippe Thiran,et al.  Analyzing Communities of Web Services Using Incentives , 2010, Int. J. Web Serv. Res..

[5]  Jamal Bentahar,et al.  Optimal Load Distribution for the Detection of VM-Based DDoS Attacks in the Cloud , 2020, IEEE Transactions on Services Computing.

[6]  Changjun Jiang,et al.  Cross-Platform Resource Scheduling for Spark and MapReduce on YARN , 2017, IEEE Transactions on Computers.

[7]  T. Revathi,et al.  Trust Model Based Scheduling of Stochastic Workflows in Cloud and Fog Computing , 2018, Studies in Big Data.

[8]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[9]  Yu Wang,et al.  Smart DAG Tasks Scheduling between Trusted and Untrusted Entities Using the MCTS Method , 2019, Sustainability.

[10]  Ching-Hsien Hsu,et al.  On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[11]  K. Selvakumar,et al.  An intelligent/cognitive model of task scheduling for IoT applications in cloud computing environment , 2018, Future Gener. Comput. Syst..

[12]  Sabri Pllana,et al.  Using a multi-agent system and artificial intelligence for monitoring and improving the cloud performance and security , 2017, Future Gener. Comput. Syst..

[13]  Gaith Rjoub,et al.  Cloud Task Scheduling Based on Swarm Intelligence and Machine Learning , 2017, 2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud).

[14]  Bruce Ratner,et al.  Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data , 2011 .

[15]  P. Ganeshkumar,et al.  Multi-objective Task Scheduling to Minimize Energy Consumption and Makespan of Cloud Computing Using NSGA-II , 2018, Journal of Network and Systems Management.

[16]  Jill Slay,et al.  Big Data Analytics for Intrusion Detection System: Statistical Decision-Making Using Finite Dirichlet Mixture Models , 2017 .

[17]  Xin Jin,et al.  K-Means Clustering , 2010, Encyclopedia of Machine Learning.

[18]  Ronald R. Yager Categorization in multi-criteria decision making , 2018, Inf. Sci..

[19]  Jing Yao,et al.  Cloud-DLS: Dynamic trusted scheduling for Cloud computing , 2012, Expert Syst. Appl..

[20]  Rajkumar Buyya,et al.  A taxonomy and survey on scheduling algorithms for scientific workflows in IaaS cloud computing environments , 2017, Concurr. Comput. Pract. Exp..

[21]  Najme Mansouri,et al.  Hybrid task scheduling strategy for cloud computing by modified particle swarm optimization and fuzzy theory , 2019, Comput. Ind. Eng..

[22]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[23]  R. Srikant,et al.  Stochastic models of load balancing and scheduling in cloud computing clusters , 2012, 2012 Proceedings IEEE INFOCOM.

[24]  Rob J Hyndman,et al.  Sample Quantiles in Statistical Packages , 1996 .

[25]  Germán Moltó,et al.  A self-managed Mesos cluster for data analytics with QoS guarantees , 2019, Future Gener. Comput. Syst..

[26]  Ye Yuan,et al.  An Improved Particle Swarm Optimization Algorithm Based on Adaptive Weight for Task Scheduling in Cloud Computing , 2018, CSAE '18.

[27]  Xinguang Peng,et al.  Trust-Based Scheduling Strategy for Workflow Applications in Cloud Environment , 2013, 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[28]  Jemal H. Abawajy,et al.  An improved genetic algorithm using greedy strategy toward task scheduling optimization in cloud environments , 2019, Neural Computing and Applications.

[29]  Zhang Kai,et al.  The research on cloud computing resource scheduling method based on time-cost-trust model , 2012, Proceedings of 2012 2nd International Conference on Computer Science and Network Technology.

[30]  B. Saravana Balaji,et al.  Epsilon-fuzzy dominance sort-based composite discrete artificial bee colony optimisation for multi-objective cloud task scheduling problem , 2017, Int. J. Bus. Intell. Data Min..

[31]  Vijayan Sugumaran,et al.  Task scheduling techniques in cloud computing: A literature survey , 2019, Future Gener. Comput. Syst..

[32]  Scott D. Brown,et al.  A simple introduction to Markov Chain Monte–Carlo sampling , 2016, Psychonomic bulletin & review.