Algorithmes de traitement de requêtes de biodiversité dans un environnement distribué

The GBIF portal contains a description of most of the global biodiversity data. It faces two problems, namely the data availability and a poor expressiveness of queries, mainly due to a growing number of users which keep expressing new needs. To tackle these problems, we envision a scalable and relatively low cost solution. With this in mind, we propose a non-invasive and decentralized architecture for processing GBIF queries over a cloud infrastructure. We define a dynamic strategy for data distribution and queries processing algorithms that fit the GBIF requirements. We demonstrate the feasibility and efficiency of our solution by a prototype implementation which allows for processing extra query types, up to now unsupported by the GBIF portal.

[1]  Peter M G Apers,et al.  Data allocation in distributed database systems , 1988, TODS.

[2]  Abdelkader Hameurlain,et al.  Resource allocation algorithm for a relational join operator in grid systems , 2012, IDEAS '12.

[3]  Dejan Chandra Gope Dynamic Data Allocation Methods in Distributed Database System , 2012 .

[4]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[5]  Philip S. Yu,et al.  Caching on the World Wide Web , 1999, IEEE Trans. Knowl. Data Eng..

[6]  Tim Kraska,et al.  Building a database on S3 , 2008, SIGMOD Conference.

[7]  Xiaofeng Meng,et al.  ESQP: an efficient SQL query processing for cloud data management , 2010, CloudDB '10.

[8]  Ndiouma Bame,et al.  Architecture répartie à large échelle pour le traitement parallèle de requêtes de biodiversité , 2012 .

[9]  László Böszörményi,et al.  A survey of Web cache replacement strategies , 2003, CSUR.

[10]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[11]  Abraham Silberschatz,et al.  HadoopDB in action: building real world applications , 2010, SIGMOD Conference.

[12]  Ross Mcnab,et al.  Simjava: A Discrete Event Simulation Library For Java , 1998 .

[13]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[14]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[15]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[16]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[17]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[18]  Shahin Kamali,et al.  Dynamic data allocation with replication in distributed systems , 2011, 30th IEEE International Performance Computing and Communications Conference.

[19]  GhemawatSanjay,et al.  The Google file system , 2003 .

[20]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[21]  Mengmeng Liu Efficient optimization and processing for distributed monitoring and control applications , 2012, PhD '12.

[22]  Abraham Silberschatz,et al.  Efficient processing of data warehousing queries in a split execution environment , 2011, SIGMOD '11.

[23]  Lukas Rupprecht,et al.  Exploiting in-network processing for big data management , 2013, SIGMOD'13 PhD Symposium.