Performance Prediction of Spark Based on the Multiple Linear Regression Analysis

It is crucial to evaluate performance of a cloud platform and determine the main factors influencing the property. Moreover, the analysis results of related performance indicators can be applied to making theoretical predictions about the performance status of the cloud platform. This work mainly focuses on researching the interrelations between the performance indicators based on the Spark technology of the cloud platform and the load performance of the cluster, and furthermore makes effective predictions for the load performance. Firstly, we put forward the analytic frameworks of Spark performance analysis, the specific indicators analysis as well as the prediction models towards the cluster load. Secondly, with respect to the evaluation indicators, we explore the basis for their selections as well as their concrete implications, and then objectively, accurately calculate the correlation formula between the practically produced performance parameters and the load performance of the cluster when the Spark cluster performs the batch applications utilizing the MLR (Multiple Linear Regression) method, and, therefore, determine the main factors impacting the load performance. Finally, we predict the load value utilizing the Spark indicator analysis and the load prediction model. The results indicate that accuracy is up to 92.307%. Consequently, the solution presented in this paper predicts the cluster load value with effetioncy.

[1]  P. Pavón-Domínguez,et al.  Evaluation of the temporal scaling variability in forecasting ground-level ozone concentrations obtained from multiple linear regressions , 2013, Environmental Monitoring and Assessment.

[2]  Mohamed Jarraya,et al.  Performance evaluation and improvement in cloud computing environment , 2015, 2015 International Conference on High Performance Computing & Simulation (HPCS).

[3]  Vehbi C. Gungor,et al.  Performance evaluation of cloud computing platforms using statistical methods , 2014, Comput. Electr. Eng..

[4]  Lei Gu,et al.  Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[5]  Carmen Andrade,et al.  Multiple linear regression model for the assessment of bond strength in corroded and non-corroded steel bars in structural concrete , 2016 .

[6]  Santoso Wibowo,et al.  Performance evaluation of cloud computing providers using fuzzy multiattribute group decision making model , 2015, 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[7]  Alain Abran,et al.  Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications , 2014, CLOSER.

[8]  Mariette Awad,et al.  A mathematical model to analyze the utilization of a cloud datacenter middleware , 2016, J. Netw. Comput. Appl..

[9]  Fatos Xhafa,et al.  Simulation, Modeling, and Performance Evaluation Tools for Cloud Applications , 2014, 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems.

[10]  Alain Abran,et al.  Performance analysis model for big data applications in cloud computing , 2014, Journal of Cloud Computing.

[11]  Li Zhang,et al.  SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark , 2015, Conf. Computing Frontiers.

[12]  Mei Rong,et al.  An Internet of Things QoE evaluation method based on multiple linear regression analysis , 2015, 2015 10th International Conference on Computer Science & Education (ICCSE).

[13]  Jie Ding,et al.  Performance Modeling of Openstack Cloud Computing Platform Using Performance Evaluation Process Algebra , 2015, 2015 International Conference on Cloud Computing and Big Data (CCBD).

[14]  Gábor Terstyánszky,et al.  Buttressing volatile desktop grids with cloud resources within a reconfigurable environment service for workflow orchestration , 2014, Journal of Cloud Computing.

[15]  Amir Masoud Rahmani,et al.  Performance evaluation and analysis of load balancing algorithms in cloud computing environments , 2016, 2016 Second International Conference on Web Research (ICWR).

[16]  Bo Deng,et al.  Study on energy saving strategy and evaluation method of green cloud computing system , 2013, 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA).

[17]  Sabela Ramos,et al.  Evaluation of messaging middleware for high-performance cloud computing , 2013, Personal and Ubiquitous Computing.