Discovering Related Data Sources in Data-Portals

To allow effective querying on the Web of data, systems frequently rely on data from multiple sources for answering queries. For instance, a user may wish to combine data from sources comprised in different statistical catalogs. Given such federated queries, in order to enable an interactive exploration of results, systems must allow user involvement during data source selection. That is, a user should be able to choose data sources contributing to query results, thereby allowing to refine/expand current findings. For this, one needs effective recommendations for data sources to be picked: data source contextualization. Recent work, however, solely aims at source contextualization for “Web tables”, while heavily relying on schema information and simple table structures. Addressing these shortcomings, we exploit work from the field of data mining and show how to enable effective Web data source contextualization. Based on a real-world finance use-case, we built a contextualization engine, which we integrated into a Web search system, our data portal, for accessing statistics data sets.

[1]  Haofen Wang,et al.  Semplore: A scalable IR approach to search the Web of Data , 2009, J. Web Semant..

[2]  Rong Jin,et al.  Approximate kernel k-means: solution to large scale kernel clustering , 2011, KDD.

[3]  Andriy Nikolov,et al.  Identifying Relevant Sources for Data Linking using a Semantic Web Index , 2011, LDOW.

[4]  Reynold Xin,et al.  Finding related tables , 2012, SIGMOD Conference.

[5]  Andrea Calì,et al.  Query rewriting and answering under constraints in data integration systems , 2003, IJCAI.

[6]  Peter Haase,et al.  The Information Workbench as a Self-Service Platform for Linked Data Applications , 2011, COLD.

[7]  Rong Zhang,et al.  A large scale clustering scheme for kernel K-Means , 2002, Object recognition supported by user interaction for service robots.

[8]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[9]  Bernardo Pereira Nunes,et al.  Identifying Candidate Datasets for Data Interlinking , 2013, ICWE.

[10]  Wolf-Tilo Balke,et al.  Query relaxation using malleable schemas , 2007, SIGMOD '07.

[11]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[12]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[13]  Jürgen Umbrich,et al.  Comparing data summaries for processing live queries over Linked Data , 2011, World Wide Web.

[14]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[15]  Alun D. Preece,et al.  Instance Based Clustering of Semantic Web Resources , 2008, ESWC.

[16]  Günter Ladwig,et al.  Linked Data Query Processing Strategies , 2010, SEMWEB.

[17]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[18]  Stephan Bloehdorn,et al.  Graph Kernels for RDF Data , 2012, ESWC.

[19]  Roi Blanco,et al.  Effective and Efficient Entity Search in RDF Data , 2011, SEMWEB.

[20]  Jürgen Umbrich,et al.  Data summaries for on-demand queries over linked data , 2010, WWW '10.