Supporting Biodiversity Studies by the EUBrazilOpenBio Hybrid Data Infrastructure

EUBrazilOpenBio is a collaborative initiative addressing strategic barriers in biodiversity research by integrating open access data and user‐friendly tools widely available in Brazil and Europe. The project deploys the EU‐Brazil Hybrid Data Infrastructure that allows the sharing of hardware, software and data on‐demand. This infrastructure provides access to several integrated services and resources to seamlessly aggregate taxonomic, biodiversity and climate data, used by processing services implementing checklist cross‐mapping and ecological niche modelling. A Virtual Research Environment was created to provide users with a single entry point to processing and data resources. This article describes the architecture, demonstration use cases and some experimental results and validation. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Wei-Tek Tsai,et al.  Service-oriented system engineering: a new paradigm , 2005, IEEE International Workshop on Service-Oriented System Engineering (SOSE'05).

[2]  Martin Gudgin SOAP Message Transmission Optimization Mechanism , 2005 .

[3]  Miklós Kozlovszky,et al.  WS-PGRADE/gUSE Generic DCI Gateway Framework for a Large Variety of User Communities , 2012, Journal of Grid Computing.

[4]  Scott Lathrop,et al.  Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment , 2014 .

[5]  D. Chessel,et al.  ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE HABITAT-SUITABILITY MAPS WITHOUT ABSENCE DATA? , 2002 .

[6]  Sergio Andreozzi,et al.  Towards GLUE 2: evolution of the computing element information model , 2008 .

[7]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[8]  Alberto Jiménez-Valverde,et al.  The uncertain nature of absences and their importance in species distribution modelling , 2010 .

[9]  LG Lohmann A new generic classification of Bignonieae (Bignoniaceae) based on molecular phylogenetic data and morphological synapomorphies , 2012 .

[10]  Alexander Papaspyrou,et al.  Open Cloud Computing Interface in Data Management-Related Setups , 2011, Grid and Cloud Database Management.

[11]  Ákos Frohner,et al.  VOMS, an Authorization System for Virtual Organizations , 2003, European Across Grids Conference.

[12]  G. K. Davis,et al.  Genome Sequence of the Pea Aphid Acyrthosiphon pisum , 2010, PLoS biology.

[13]  Raymond Feng,et al.  Tuscany SCA in Action , 2011 .

[14]  J. Grinnell Field Tests of Theories Concerning Distributional Control , 1917, The American Naturalist.

[15]  清華大学 SOSE 2005 : IEEE International Workshop on Service-Oriented System Engineering : proceedings : Beijing, China, 20-21 October 2005 , 2005 .

[16]  Alex Hardisty,et al.  UvA-DARE ( Digital Academic Repository ) A decadal view of biodiversity informatics : challenges and priorities , 2013 .

[17]  Sevak Asadorian,et al.  Implementation of Maple, a browser-based staff management application using Google web development toolkit and Sencha GXT application framework , 2013 .

[18]  Pasquale Pagano,et al.  The D4Science Approach toward Grid Resource Sharing: The Species Occurrence Maps Generation Case , 2011 .

[19]  Robert P. Anderson,et al.  Evaluating predictive models of species’ distributions: criteria for selecting optimal models , 2003 .

[20]  A. Townsend Peterson,et al.  VertNet: A New Model for Biodiversity Data Sharing , 2010, PLoS biology.

[21]  Domenico Talia,et al.  Enabling Cloud Interoperability with COMPSs , 2012, Euro-Par.

[22]  Dagmar Triebel,et al.  An appraisal of megascience platforms for biodiversity information , 2012 .

[23]  Jeff Weber,et al.  Workflow Management in Condor , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[24]  Pasquale Pagano,et al.  Virtual Research Environments: An Overview and a Research Agenda , 2013, Data Sci. J..

[25]  Johan Montagnat,et al.  Flexible and Efficient Workflow Deployment of Data-Intensive Applications On Grids With MOTEUR , 2008, Int. J. High Perform. Comput. Appl..

[26]  Grant Yamashita,et al.  Data hosting infrastructure for primary biodiversity data , 2011, BMC Bioinformatics.

[27]  A. Peterson,et al.  INTERPRETATION OF MODELS OF FUNDAMENTAL ECOLOGICAL NICHES AND SPECIES' DISTRIBUTIONAL AREAS , 2005 .

[28]  F. Grassle The Ocean Biogeographic Information System (OBIS): An On-line, Worldwide Atlas for Accessing, Modeling and Mapping Marine Biological Data in a Multidimensional Geographic Context , 2000 .

[29]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[30]  Pasquale Pagano,et al.  Managing Big Data through Hybrid Data Infrastructures , 2012, ERCIM News.

[31]  Pasquale Pagano,et al.  Making Virtual Research Environments in the Cloud a Reality: the gCube Approach , 2010, ERCIM News.

[32]  Walter Jetz,et al.  Integrating biodiversity distribution knowledge: toward a global map of life. , 2012, Trends in ecology & evolution.

[33]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[34]  Peter Brewer,et al.  openModeller: a generic approach to species’ potential distribution modelling , 2011, GeoInformatica.

[35]  Andrew Edmonds,et al.  Open cloud computing interface , 2011 .

[36]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[37]  Pasquale Pagano,et al.  EU-Brazil Open Data and Cloud Computing e-Infrastructure for Biodiversity , 2013, IWSG.

[38]  Martin N. Rossor,et al.  Creation of an Open-Access, Mutation-Defined Fibroblast Resource for Neurological Disease Research , 2012, PloS one.

[39]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[40]  R. Kadmon,et al.  Assessment of alternative approaches for bioclimatic modeling with special emphasis on the Mahalanobis distance , 2003 .

[41]  P. Kirk,et al.  International Code of Nomenclature for algae, fungi, and plants (Melbourne Code) , 2012 .

[42]  Antonio Laganà,et al.  GriF: A Grid framework for a Web Service approach to reactive scattering , 2010, Comput. Phys. Commun..

[43]  Robert P. Anderson,et al.  Ecological Niches and Geographic Distributions , 2011 .

[44]  Lúcia G Lohmann,et al.  Untangling the phylogeny of neotropical lianas (Bignonieae, Bignoniaceae). , 2006, American journal of botany.

[45]  Erwin Laure,et al.  Towards transparent integration of heterogeneous cloud storage platforms , 2011, DIDC '11.

[46]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[47]  Dmitri I. Svergun,et al.  WeNMR: Structural Biology on the Grid , 2011, Journal of Grid Computing.

[48]  F. Bisby,et al.  Species 2000 & ITIS Catalogue of Life , 2010 .

[49]  A. Hardisty,et al.  A decadal view of biodiversity informatics: challenges and priorities , 2013, BMC Ecology.

[50]  Pasquale Pagano,et al.  gCube: A Service-Oriented Application Framework on the Grid , 2008, ERCIM News.

[51]  Ignacio Blanquer,et al.  Enabling e-Science Applications on the Cloud with COMPSs , 2011, Euro-Par Workshops.

[52]  J L Edwards,et al.  Interoperability of biodiversity databases: biodiversity information on every desktop. , 2000, Science.

[53]  Paul Watson,et al.  Developing cloud applications using the e-Science Central platform , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[54]  Frederico Araújo Durão,et al.  USTO.RE: A Private Cloud Storage Software System , 2013, ICWE.

[55]  N. Besansky,et al.  Physiological correlates of ecological divergence along an urbanization gradient: differential tolerance to ammonia among molecular forms of the malaria mosquito Anopheles gambiae , 2013, BMC Ecology.

[56]  Péter Kacsuk,et al.  Multi-Grid, Multi-User Workflows in the P-GRADE Grid Portal , 2005, Journal of Grid Computing.

[57]  Cristina Boeres,et al.  EasyGrid Enabling of Iterative Tightly-Coupled Parallel MPI Applications , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[58]  J. Chris Anderson,et al.  CouchDB: The Definitive Guide , 2010 .

[59]  Cristina Boeres,et al.  EasyGrid: towards a framework for the automatic Grid enabling of legacy MPI applications , 2004, Concurr. Pract. Exp..

[60]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[61]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[62]  Cristina Boeres,et al.  EasyGrid: towards a framework for the automatic Grid enabling of legacy MPI applications: Research Articles , 2004 .

[63]  Alex Hardisty,et al.  BioVeL: Biodiversity Virtual e-Laboratory , 2011 .

[64]  Ignacio Blanquer,et al.  Programming Ecological Niche Modeling Workflows in the Cloud , 2013, 2013 27th International Conference on Advanced Information Networking and Applications Workshops.