DV-ARPA: Data Variety Aware Resource Provisioning for Big Data Processing in Accumulative Applications

In Cloud Computing, the resource provisioning approach used has a great impact on the processing cost, especially when it is used for Big Data processing. Due to data variety, the performance of virtual machines (VM) may differ based on the contents of the data blocks. Data variety-oblivious allocation causes a reduction in the performance of VMs and increases the processing cost. Thus, it is possible to reduce the total cost of the job by matching the VMs with the given data blocks. We use a data-variety-aware resource allocation approach to reduce the processing cost of the considered job. For this issue, we divide the input data into some data blocks. We define the significance of each data block and based on it we choose the appropriate VMs to reduce the cost. For detecting the significance of each data portion, we use a specific sampling method. This approach is applicable to accumulative applications. We use some well-known benchmarks and configured servers for our evaluations. Based on the results, our provisioning approach improves the processing cost, up to 35% compared to other approaches.

[1]  Uwe Naumann,et al.  Towards automatic significance analysis for approximate computing , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[2]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[3]  Juan Touriño,et al.  BDEv 3.0: Energy efficiency and microarchitectural characterization of Big Data processing frameworks , 2018, Future Gener. Comput. Syst..

[4]  Stathes Hadjiefthymiades,et al.  An Efficient Time Optimized Scheme for Progressive Analytics in Big Data , 2015, Big Data Res..

[5]  Ioana Stanoi,et al.  WhiteWater: Distributed Processing of Fast Streams , 2007, IEEE Transactions on Knowledge and Data Engineering.

[6]  Maziar Goudarzi,et al.  Gapprox: using Gallup approach for approximation in Big Data processing , 2019, Journal of Big Data.

[7]  Maziar Goudarzi,et al.  Using Data Variety for Efficient Progressive Big Data Processing in Warehouse-Scale Computers , 2017, IEEE Computer Architecture Letters.

[8]  Enrico Vicario,et al.  New Frontiers in Quantitative Methods in Informatics , 2017, Communications in Computer and Information Science.

[9]  Salvatore Distefano,et al.  Vs-Driven Big Data Process Development , 2017, InfQ@VALUETOOLS.

[10]  Thu D. Nguyen,et al.  ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.

[11]  Jean-Daniel Fekete,et al.  Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis , 2016, ArXiv.

[12]  Fabio Kon,et al.  A comprehensive view of Hadoop research - A systematic literature review , 2014, J. Netw. Comput. Appl..

[13]  Youngseok Lee,et al.  Detecting DDoS attacks with Hadoop , 2011, CoNEXT '11 Student.

[14]  Maziar Goudarzi,et al.  SAIR: significance-aware approach to improve QoR of big data processing in case of budget constraint , 2019, The Journal of Supercomputing.

[15]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[16]  Rajkumar Buyya,et al.  SLA-Based Resource Scheduling for Big Data Analytics as a Service in Cloud Computing Environments , 2015, 2015 44th International Conference on Parallel Processing.

[17]  José Luis Vázquez-Poletti,et al.  Provisioning data analytic workloads in a cloud , 2013, Future Gener. Comput. Syst..

[18]  Haiying Shen,et al.  Towards green cloud computing: Demand allocation and pricing policies for cloud service brokerage , 2015, IEEE BigData.

[19]  Badrish Chandramouli,et al.  Scalable Progressive Analytics on Big Data in the Cloud , 2013, Proc. VLDB Endow..

[20]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[21]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[23]  Hamid Nasiri,et al.  Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities , 2019, Journal of Big Data.

[24]  Yang Wang,et al.  On Optimal Budget-Driven Scheduling Algorithms for MapReduce Jobs in the Heterogeneous Cloud , 2013 .

[25]  Kostas Kolomvatsos An intelligent scheme for assigning queries , 2017, Applied Intelligence.

[26]  Sandeep K. Sood,et al.  Efficient Resource Management System Based on 4Vs of Big Data Streams , 2017, Big Data Res..

[27]  A. Winsor Sampling techniques. , 2000, Nursing times.