Adaptive Join Operator for Federated Queries over Linked Data Endpoints

Traditional static query optimization is not adequate for query federation over linked data endpoints due to unpredictable data arrival rates and missing statistics. In this paper, we propose an adaptive join operator for federated query processing which can change the join method during the execution. Our approach always begins with symmetric hash join in order to produce the first result tuple as soon as possible and changes the join method as bind join when it estimates that bind join is more efficient than symmetric hash join for the rest of the process. We compare our approach with symmetric hash join and bind join. Performance evaluation shows that our approach provides optimal response time and has the adaptation ability to the different data arrival rates.

[1]  Beng Chin Ooi,et al.  An adaptable distributed query processing architecture , 2005, Data Knowl. Eng..

[2]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[3]  Volker Markl,et al.  POP/FED: progressive query optimization for federated queries in DB2 , 2006, VLDB.

[4]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[5]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[6]  Abraham Bernstein,et al.  Avalanche: Putting the Spirit of the Web back into Semantic Web Querying , 2010, ISWC Posters&Demos.

[7]  Laurent Amsaleg,et al.  Dynamic Query Operator Scheduling for Wide-Area Remote Access , 1998, Distributed and Parallel Databases.

[8]  Hamid Pirahesh,et al.  Robust query processing through progressive optimization , 2004, SIGMOD '04.

[9]  Jeffrey F. Naughton,et al.  Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources , 2003, VLDB.

[10]  Jennifer Widom,et al.  Content-Based Routing: Different Plans for Different Data , 2005, VLDB.

[11]  Abdelkader Hameurlain,et al.  Mobile join operators for restricted sources , 2005, Mob. Inf. Syst..

[12]  Amol Deshpande,et al.  An initial study of overheads of eddies , 2004, SGMD.

[13]  Abdelkader Hameurlain,et al.  Mobile Agent Based Self-Adaptive Join for Wide-Area Distributed Query Processing , 2004, J. Database Manag..

[14]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, Distributed and Parallel Databases.

[15]  Steffen Staab,et al.  Federated Data Management and Query Optimization for Linked Open Data , 2011, New Directions in Web Data Management 1.

[16]  J. S. Saini,et al.  Adaptive Query Processing , 2006 .

[17]  Maribel Acosta,et al.  Networks of Linked Data Eddies: An Adaptive Web Query Processing Engine for RDF Data , 2015, SEMWEB.

[18]  Isao Kojima,et al.  Adaptive Integration of Distributed Semantic Web Data , 2010, DNIS.

[19]  Isao Kojima,et al.  ADERIS: An Adaptive Query Processor for Joining Federated SPARQL Endpoints , 2011, OTM Conferences.

[20]  Hugh C. Davis,et al.  LHD: Optimising Linked Data Query Processing Using Parallelisation , 2013, LDOW.

[21]  David J. DeWitt,et al.  Proactive re-optimization , 2005, SIGMOD '05.

[22]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[23]  Maribel Acosta,et al.  ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints , 2011, SEMWEB.

[24]  Oguz Dikenelli,et al.  Federated query processing on linked data: a qualitative survey and open challenges , 2015, The Knowledge Engineering Review.

[25]  Rik Van de Walle,et al.  Querying Datasets on the Web with High Availability , 2014, SEMWEB.

[26]  Volker Markl,et al.  Progressive optimization in a shared-nothing parallel database , 2007, SIGMOD '07.

[27]  Kian-Lee Tan,et al.  Multi-Join Optimization for Symmetric Multiprocessors , 1993, VLDB.

[28]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[29]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[30]  Joseph M. Hellerstein,et al.  Using state modules for adaptive query processing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[31]  Joseph M. Hellerstein,et al.  Lifting the Burden of History from Adaptive Query Processing , 2004, VLDB.

[32]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[33]  Dennis McLeod,et al.  An Adaptive Probe-Based Technique to Optimize Join Queries in Distributed Internet Databases , 2001, J. Database Manag..

[34]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[35]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.