Big data analytics for climate change and biodiversity in the EUBrazilCC federated cloud infrastructure

The analysis of large volumes of data is key for knowledge discovery in several scientific domains such as climate, astrophysics, life sciences among others. It requires a large set of computational and storage resources, as well as flexible and efficient software solutions able to dynamically exploit the available infrastructure and address issues related to data volume, distribution, velocity and heterogeneity. This paper presents a data-driven and cloud-based use case implemented in the context of the EUBrazilCC project for the analysis of climate change and biodiversity data. The use case architecture and main components, as well as a Platform as a Service (PaaS) framework for big data analytics named PDAS, together with its elastic deployment in the EUBrazilCC federated cloud infrastructure are presented and discussed in detail.

[1]  Miguel Caballer,et al.  A Generic Catalog and Repository Service for Virtual Machine Images , 2010 .

[2]  Sandro Fiore,et al.  Special section: Data management for eScience , 2011, Future Gener. Comput. Syst..

[3]  A. Holtslag,et al.  A remote sensing surface energy balance algorithm for land (SEBAL)-1. Formulation , 1998 .

[4]  Ákos Frohner,et al.  From gridmap-file to VOMS: managing authorization in a Grid environment , 2005, Future Gener. Comput. Syst..

[5]  Ignacio Blanquer,et al.  Dynamic Management of Virtual Infrastructures , 2015, Journal of Grid Computing.

[6]  Peter Brewer,et al.  openModeller: a generic approach to species’ potential distribution modelling , 2011, GeoInformatica.

[7]  Christina Freytag,et al.  Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .

[8]  Dean N. Williams,et al.  The Earth System Grid Federation: An open infrastructure for access to distributed geospatial data , 2012, 2012 IEEE 8th International Conference on E-Science.

[9]  Ian T. Foster,et al.  Ophidia: A full software stack for scientific data analytics , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).

[10]  Karl E. Taylor,et al.  An overview of CMIP5 and the experiment design , 2012 .

[11]  Ian T. Foster,et al.  A big data analytics framework for scientific data management , 2013, 2013 IEEE International Conference on Big Data.

[12]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[13]  W. Bastiaanssen,et al.  A remote sensing surface energy balance algorithm for land (SEBAL). , 1998 .

[14]  Robert B. Ross,et al.  Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.

[15]  Ian T. Foster,et al.  Ophidia: Toward Big Data Analytics for eScience , 2013, ICCS.

[16]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .