Clouds for scalable Big Data processing

The last decadehas been characterisedby an exponential growthof digital data production. This trend is particularly strong in scientific computing. For example, in the biological, medical, astronomic and earth science fields, very large data sets are produced every day from the observation or simulation of complex phenomena. At the same time, new massive sources of digital data have emerged. These include social media platforms such as Facebook, Instagram, and Twitter which are credited among the most important sources of data production in Internet. This Big Data is hard to process on conventional computing technologies and demands for parallel and distributed processing, which can be effectively provided by Cloud computing systems and services. This special issue focuses on the use andmodelling of Clouds as scalable platforms for addressing the computational and data storage needs of the Big Data applications that are being developed nowadays. In the first paper [1], Belcastro et al. address the main issues in the area of programming models and systems for Big Data analysis, which are extensively used in Cloud environments. As a first contribution, the most popular programming models for Big Data analysis (MapReduce, Directed Acyclic Graph, Message Passing, Bulk Synchronous Parallel, Workflow and SQL-like) are presented and discussed. Then, the paper analyses and compares the features of the main systems implementing these models, with the aim of helping developers identifying and selecting the best solution according to their skills, hardware availability, andapplicationneeds. Specifically, the systemsare comparedaccording to four criteria: (i) level of abstraction,which refers theprogrammingcapabilities of hiding low-level details of a system; (ii) type of parallelism, which describes theway inwhich a system allows to express parallel operations; (iii) infrastructure scale, which refers to the capability of a system to efficiently execute applications taking advantage from the infrastructure size; and (iv) classes of applications, which describes the most common application domain of a system. The second paper [2], by Ristov et al., focuses on the accurate scalability modelling of Cloud elastic services. The speedup and efficiency parameters provide important information about performance of a computer system with scaled resources compared with a computer system with a single processor. However, as Cloud elastic services’ load is variable, it is also vital to analyse the load in order to determine which system is more effective and efficient. The paper argues that both the speedup and efficiency are not sufficient enough for proper modelling of Cloud elastic services, as the assumptions for both the speedup and efficiency are that the system’s resources are scaled, while the load is constant. Accordingly, the paper defines two additional scaled systems by (i) scaling the load and (ii) scaling both the load and resources. Amodel is introduced to determine the efficiency for each scaled system, which can be used to compare the efficiencies of all scaled systems, regardless if they are scaled in terms of load or resources. An evaluation of themodel by usingMicrosoft Azure is presented to confirm experimentally the theoretical analysis.

[1]  Domenico Talia,et al.  Programming models and systems for Big Data analysis , 2019, Int. J. Parallel Emergent Distributed Syst..

[2]  M. Tahar Kechadi,et al.  Parallel and distributed clustering framework for big spatial data mining , 2019, Int. J. Parallel Emergent Distributed Syst..

[3]  Eugenio Cesario,et al.  Data analytics for energy-efficient clouds: design, implementation and evaluation , 2019, Int. J. Parallel Emergent Distributed Syst..

[4]  Radu Prodan,et al.  A new model for cloud elastic services efficiency , 2019, Int. J. Parallel Emergent Distributed Syst..