DrillBeyond: Open-World SQL Queries Using Web Tables
暂无分享,去创建一个
The Web consists of a huge number of documents, but also large amounts structured information, for example in the form of HTML tables containing relationalstyle data. One typical usage scenario for this kind of data is their integration into a database or data warehouse in order to apply data analytics. However, in today’s business intelligence tools there is an evident lack of support for so-called situational or ad-hoc data integration. In this demonstration we will therefore present DrillBeyond, a novel database and information retrieval engine which allows users to query a local database as well as the web datasets in a seamless and integrated way with standard SQL. The audience will be able to pose queries to our DrillBeyond system which will be answered partly from local data in the database and partly from datasets that originate from the Web of Data. We will demonstrate the integration of the web tables back into the DBMS in order to apply its analytical features. 1 Open-World SQL Queries The system we want to demonstrate offers a novel way of integrating web tables into regular query processing in a relational database. We present a modified RDBMS that is able to answer so-called open-world queries which are not restricted to the schema of the local database. Instead the user is allowed to use arbitrary attribute names that do not appear in the original schema. Consider the following running example query: SELECT p o p u l a t i o n , n_name , AVG( o _ t o t a l p r i c e ) FROM n a t i o n JOIN r e g i o n ON n _ r e g i o n k e y = r _ r e g i o n k e y JOIN c u s t o m e r ON n _ n a t i o n k e y = c _ n a t i o n k e y JOIN o r d e r s ON c _ c u s t k e y = o _ c u s t k e y WHERE r_name = ’AMERICA ’ GROUP BY p o p u l a t i o n , n_name ORDER BY p o p u l a t i o n The population attribute which is used in the SELECT and ORDER BY clauses is not part of the TPC-H schema and therefore requires special processing. In the DrillBeyond system, missing attributes are translated into keyword queries that are run against an index of open datasets on the web. It will answer the query by substituting the missing attribute
[1] Wolfgang Lehner,et al. DrillBeyond: Enabling Business Analysts to Explore the Web of Open Data , 2012, Proc. VLDB Endow..