Enabling Strategies for Big Data Analytics in Hybrid Infrastructures

A huge volume of data is produced every day by social networks (e.g. Facebook, Instagram, Whatsapp, etc.), sensors, mobile devices and other applications. Although the Cloud computing scenario has grown rapidly in recent years, it still suffers from a lack of the kind of standardization that involves the resource management for Big Data applications, such as the case of MapReduce. In this context, the users face a big challenge in attempting to understand the requirements of the application and how to consolidate the resources properly. This scenario raises significant challenges in the different areas: systems, infrastructure, platforms as well as providing several research opportunities in Big Data Analytics. This work proposes the use of hybrid infrastructures such as Cloud and Volunteer Computing for Big Data processing and analysis. In addition, it provides a data distribution model that improves the resource management of Big Data applications in hybrid infrastructures. The results indicate the feasibility of hybrid infrastructures since it supports the reproducibility and predictability of Big Data processing by low and high-scale simulation within Hybrid infrastructures.

[1]  Rajkumar Buyya,et al.  Interconnected Cloud Computing Environments , 2014, ACM Comput. Surv..

[2]  Rajkumar Buyya,et al.  Big Data Analytics = Machine Learning + Cloud Computing , 2016, ArXiv.

[3]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[4]  Luciana Arantes,et al.  MRA++: Scheduling and data placement on MapReduce for heterogeneous environments , 2015, Future Gener. Comput. Syst..

[5]  Gilles Fedak,et al.  Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[6]  Gilles Fedak,et al.  SMART: An Application Framework for Real Time Big Data Analysis on Heterogeneous Cloud Environments , 2015, 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing.

[7]  Valentin Cristea,et al.  Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing , 2015, Future Gener. Comput. Syst..

[8]  Daniel Grosu,et al.  A PTAS Mechanism for Provisioning and Allocation of Heterogeneous Cloud Resources , 2015, IEEE Transactions on Parallel and Distributed Systems.

[9]  Kostas Katrinis,et al.  Enabling Big Data Analytics in the Hybrid Cloud Using Iterative MapReduce , 2015, 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC).

[10]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[11]  Kento Aida,et al.  Performance of Hadoop Application on Hybrid Cloud , 2015, 2015 International Conference on Cloud Computing Research and Innovation (ICCCRI).

[12]  Rui Wang,et al.  Bridging Data in the Clouds: An Environment-Aware System for Geographically Distributed Data Transfers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[13]  Tom White,et al.  Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale (4. ed., revised & updated) , 2012 .

[14]  G. Someswar,et al.  Workload-Driven Design and Evaluation of Large-Scale Data-Centric Systems , 2022 .

[15]  César A. F. De Rose,et al.  A Performance Isolation Analysis of Disk-Intensive Workloads on Container-Based Clouds , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[16]  Yang Xiang,et al.  Hadoop Performance Modeling for Job Estimation and Resource Provisioning , 2016, IEEE Transactions on Parallel and Distributed Systems.

[17]  Baochun Li,et al.  Wide area analytics for geographically distributed datacenters , 2016 .

[18]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[19]  Khaled Belkadi,et al.  Parallel Distributed Patterns Mining Using Hadoop MapReduce Framework , 2017, Int. J. Grid High Perform. Comput..

[20]  César A. F. De Rose,et al.  Understanding performance interference in multi-tenant cloud databases and web applications , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[21]  Gilles Fedak,et al.  HybridMR: a new approach for hybrid MapReduce combining desktop grid and cloud infrastructures , 2015, Concurr. Comput. Pract. Exp..

[22]  Dalvan Griebler,et al.  Improving the Network Performance of a Container-Based Cloud Environment for Hadoop Systems , 2017, 2017 International Conference on High Performance Computing & Simulation (HPCS).

[23]  Gilles Fedak,et al.  BIGhybrid: a simulator for MapReduce applications in hybrid distributed infrastructures validated with the Grid5000 experimental platform , 2016, Concurr. Comput. Pract. Exp..

[24]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[25]  Duc-Hung Le,et al.  SALSA: A Framework for Dynamic Configuration of Cloud Services , 2014, 2014 IEEE 6th International Conference on Cloud Computing Technology and Science.

[26]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[27]  Gilles Fedak,et al.  Analysis of Data Reliability Tradeoffs in Hybrid Distributed Storage Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[28]  Wu-chun Feng,et al.  MOON: MapReduce On Opportunistic eNvironments , 2010, HPDC '10.

[29]  S. D. Madhu Kumar,et al.  Improving execution speed of incremental runs of MapReduce using provenance , 2017, Int. J. Big Data Intell..