Improving a Run Time Job Prediction Model for Distributed Computing Based on Two Level Predictions

Nowadays, distributed computing environment faces many difficulties because the number of submitted jobs is increasing dramatically. One of the most used method to serve the jobs is to find the accurate run time of the submitted jobs. This paper proposes a new job prediction method, to predict on jobs’ run time using two level prediction namely linear regression model and fitting model. The proposed model uses six variables including user ID, group ID, executable ID, number of CPUs, memory size and average CPU time, furthermore to solve the problem of the categorical variables (i.e. user ID, group ID and executable ID) a dummy code is used. To adjust and to find the best combination between linear regression model and fitting models, different fitting models are used by combining linear and nonlinear fitting models. By simulation the results show that the proposed model is better than previous models when smoothing spline fitting is used, also the results indicate that proposed model is efficient with low error and high prediction rate compared with previous models.

[1]  Peter A. Dinda Online Prediction of the Running Time of Tasks , 2004, Cluster Computing.

[2]  Marco Aurélio Stelmar Netto,et al.  Job placement advisor based on turnaround predictions for HPC hybrid clouds , 2016, Future Gener. Comput. Syst..

[3]  Subhash C. Kak,et al.  A Survey of Prediction Using Social Media , 2012, ArXiv.

[4]  Warren Smith Prediction Services for Distributed Computing , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[5]  Gleb I. Radchenko,et al.  Problem-oriented scheduling of cloud applications: PO-HEFT algorithm case study , 2016, 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[6]  Rizos Sakellariou,et al.  A Performance Model to Estimate Execution Time of Scientific Workflows on the Cloud , 2014, 2014 9th Workshop on Workflows in Support of Large-Scale Science.

[7]  Jian Zhang,et al.  TCSA: A dynamic job scheduling algorithm for computational grids , 2016, 2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI).

[8]  Helen D. Karatza,et al.  Job Scheduling in a Distributed System Using Backfilling with Inaccurate Runtime Computations , 2010, 2010 International Conference on Complex, Intelligent and Software Intensive Systems.