A performance optimization strategy based on degree of parallelism and allocation fitness

With the emergence of big data era, most of the current performance optimization strategies are mainly used in a distributed computing framework with disks as the underlying storage. They may solve the problems in traditional disk-based distribution, but they are hard to transplant and are not well suitable for performance optimization especially for an in-memory computing framework on account of different underlying storage and computation architecture. In this paper, we first give the definition of the resource allocation model, parallelism degree model, and allocation fitness model on the basis of the theoretical analysis of Spark architecture. Second, based on the model presented, we propose a strategy embedded in the evaluation model which is easy to perform. The optimization strategy selects the worker with a lower load that satisfies requirements to assign the latter tasks, and the worker with a higher load may not be assigned tasks. The experiments consisting of four variance jobs are conducted to verify the effectiveness of the presented strategy.

[1]  Tvrtko-Matija Šercar,et al.  Viktor Mayer-Schönberger and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[2]  Tom Fawcett,et al.  Data Science and its Relationship to Big Data and Data-Driven Decision Making , 2013, Big Data.

[3]  Yuzhe Tang,et al.  Autopipelining for Data Stream Processing , 2013, IEEE Transactions on Parallel and Distributed Systems.

[4]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[5]  Yasmine Lamari,et al.  Clustering categorical data based on the relational analysis approach and MapReduce , 2017, Journal of Big Data.

[6]  Abhijit Das,et al.  Use of SIMD-based data parallelism to speed up sieving in integer-factoring algorithms , 2015, Appl. Math. Comput..

[7]  Saint John Walker Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2014 .

[8]  Sangkeun Lee,et al.  Effective Five Directional Partial Derivatives-Based Image Smoothing and a Parallel Structure Design , 2016, IEEE Transactions on Image Processing.

[9]  Bran Selic,et al.  A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems , 2013, The Journal of Supercomputing.

[10]  Abir Awad Abir Awad , 2022 .

[11]  Qi Fu,et al.  The Survey of Big Data , 2015 .

[12]  Christian Napoli,et al.  A mathematical model for file fragment diffusion and a neural predictor to manage priority queues over BitTorrent , 2016, Int. J. Appl. Math. Comput. Sci..

[13]  WooSuk Kim,et al.  Image Filter Optimization Method based on common sub-expression elimination for Low Power Image Feature Extraction Hardware Design , 2017 .

[14]  Sangyeun Cho,et al.  YourSQL: A High-Performance Database System Leveraging In-Storage Computing , 2016, Proc. VLDB Endow..

[15]  Linpeng Huang,et al.  SunwayMR: A distributed parallel computing framework with convenient data-intensive applications programming , 2017, Future Gener. Comput. Syst..

[16]  Wongyu Shin,et al.  Elaborate Refresh: A Fine Granularity Retention Management for Deep Submicron DRAMs , 2018, IEEE Transactions on Computers.

[17]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[18]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .