Mapping RDF knowledge bases using exchange samples

Nowadays, the Web of Data is in its earliest stages; it is currently organised into a variety of linked knowledge bases that have been developed independently by different organisations. RDF is one of the most popular languages to represent data in this context, which motivates the need to perform complex integration tasks amongst RDF knowledge bases. These tasks are performed using schema mappings, which are declarative specifications of the relationships amongst a source and a target knowledge base. Generating schema mappings automatically is appealing because this relieves users from the burden of handcrafting them. In the literature, the vast majority of proposals are based on the data models of the knowledge bases to be integrated, that is, on classes, properties, and constraints. In the Web of Data, there exist many data models that comprise very few constraints or no constraints at all, which has motivated some researchers to work on an alternate paradigm that does not rely on constraints. Unfortunately, the current proposals that fit this paradigm are not completely automatic. In this article, we present our proposal to automatically generate schema mappings amongst RDF knowledge bases. Its salient features are that it uses a single input exchange sample and a set of input correspondences, but does not require any constraints to be available or any user intervention; it has been validated and evaluated using many experiments that prove that it is effective and efficient in practice; the schema mappings that it produces are GLAV. Other researchers can reproduce our experiments since all of our implementations and repositories are publicly available.

[1]  Phokion G. Kolaitis,et al.  Designing and refining schema mappings via data examples , 2011, SIGMOD '11.

[2]  Adrian Mocan,et al.  An Ontology-Based Data Mediation Framework for Semantic Environments , 2007, Int. J. Semantic Web Inf. Syst..

[3]  Markus Freitag,et al.  GovWILD: integrating open government data for transparency , 2012, WWW.

[4]  Rafael Corchuelo,et al.  Benchmarking Data Exchange among Semantic-Web Ontologies , 2013, IEEE Transactions on Knowledge and Data Engineering.

[5]  Jorge Pérez,et al.  Schema mappings and data exchange for graph databases , 2013, ICDT '13.

[6]  Phokion G. Kolaitis,et al.  EIRENE: Interactive Design and Refinement of Schema Mappings via Data Examples , 2011, Proc. VLDB Endow..

[7]  Jian Xu,et al.  Integrating domain heterogeneous data sources using decomposition aggregation queries , 2014, Inf. Syst..

[8]  Jean-François Baget,et al.  Extending SPARQL with regular expression patterns (for querying RDF) , 2009, J. Web Semant..

[9]  Marcos André Gonçalves,et al.  An evolutionary approach to complex schema matching , 2013, Inf. Syst..

[10]  Elizabeth Chang,et al.  Ontology usage analysis in the ontology lifecycle: A state-of-the-art review , 2015, Knowl. Based Syst..

[11]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[12]  Paolo Papotti,et al.  Core schema mappings , 2009, SIGMOD Conference.

[13]  J. Carroll,et al.  Jena: implementing the semantic web recommendations , 2004, WWW Alt. '04.

[14]  Marcelo Arenas,et al.  XML data exchange: consistency and query answering , 2005, PODS '05.

[15]  Johanna Völker,et al.  Statistical Schema Induction , 2011, ESWC.

[16]  Matteo Golfarelli,et al.  OLAP query reformulation in peer-to-peer data warehousing , 2012, Inf. Syst..

[17]  Tom Heath How Will We Interact with the Web of Data? , 2008, IEEE Internet Computing.

[18]  David Ruiz,et al.  Benchmarking the Performance of Linked Data Translation Systems , 2012, LDOW.

[19]  Jeremy J. Carroll,et al.  Matching RDF Graphs , 2002, SEMWEB.

[20]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[21]  Erhard Rahm,et al.  Schema Matching and Mapping , 2013, Schema Matching and Mapping.

[22]  Philippe Bonnet,et al.  Computational reproducibility: state-of-the-art, challenges, and database research opportunities , 2012, SIGMOD Conference.

[23]  Steffen Staab,et al.  Model Driven Specification of Ontology Translations , 2008, ER.

[24]  Wang Chiew Tan,et al.  STBenchmark: towards a benchmark for mapping systems , 2008, Proc. VLDB Endow..

[25]  Jason J. Jung,et al.  Recommendation system based on multilingual entity matching on linked open data , 2014, J. Intell. Fuzzy Syst..

[26]  Filip Murlak,et al.  XML schema mappings , 2009, PODS.

[27]  Giovanni Tummarello,et al.  Introducing RDF Graph Summary with Application to Assisted SPARQL Formulation , 2012, 2012 23rd International Workshop on Database and Expert Systems Applications.

[28]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[29]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[30]  Rafael Corchuelo,et al.  MostoDE: A tool to exchange data amongst semantic-web ontologies , 2013, J. Syst. Softw..

[31]  Grigoris Antoniou,et al.  Ontology change: classification and survey , 2008, The Knowledge Engineering Review.

[32]  Georg Lausen,et al.  SPARQLing constraints for RDF , 2008, EDBT '08.

[33]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[34]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[35]  Rafael Corchuelo,et al.  CALA: An unsupervised URL-based web page classification system , 2014, Knowl. Based Syst..

[36]  Borys Omelayenko,et al.  Integrating Vocabularies: Discovering and Representing Vocabulary Maps , 2002, SEMWEB.

[37]  Hong-Gee Kim,et al.  Aligning ontologies with subsumption and equivalence relations in Linked Data , 2015, Knowl. Based Syst..

[38]  Axel Polleres,et al.  OWL: Yet to arrive on the Web of Data? , 2012, LDOW.

[39]  Li Qian,et al.  Sample-driven schema mapping , 2012, SIGMOD Conference.

[40]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[41]  Peishen Qi,et al.  Ontology Translation on the Semantic Web , 2003, J. Data Semant..

[42]  Erhard Rahm,et al.  Frameworks for entity matching: A comparison , 2010, Data Knowl. Eng..

[43]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[44]  A. Winsor Sampling techniques. , 2000, Nursing times.

[45]  Boris Motik,et al.  MAFRA - A MApping FRAmework for Distributed Ontologies , 2002, EKAW.

[46]  Axel Polleres,et al.  On Blank Nodes , 2011, SEMWEB.

[47]  J. Hopcroft,et al.  Algorithm 447: efficient algorithms for graph manipulation , 1973, CACM.

[48]  Phokion G. Kolaitis,et al.  Learning schema mappings , 2012, ICDT '12.

[49]  Rafael Corchuelo,et al.  Exchanging Data amongst Linked Data applications , 2013, Knowledge and Information Systems.

[50]  Renée J. Miller,et al.  Muse: Mapping Understanding and deSign by Example , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[51]  Rafael Corchuelo,et al.  MostoDEx: A tool to exchange RDF data using exchange samples , 2015, J. Syst. Softw..

[52]  Alin Deutsch,et al.  Exporting and interactively querying Web service-accessed sources: The CLIDE System , 2007, TODS.

[53]  Barry Bishop,et al.  OWLIM: A family of scalable semantic repositories , 2011, Semantic Web.

[54]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[55]  J. Hopcroft,et al.  Efficient algorithms for graph manipulation , 1971 .

[56]  Laura M. Haas,et al.  Clio grows up: from research prototype to industrial tool , 2005, SIGMOD '05.

[57]  Jürgen Umbrich,et al.  An empirical survey of Linked Data conformance , 2012, J. Web Semant..

[58]  Hong Liu,et al.  A mapping-based tree similarity algorithm and its application to ontology alignment , 2014, Knowl. Based Syst..

[59]  Mike Dean,et al.  Application of Ontology Translation , 2007, ISWC/ASWC.

[60]  Paolo Papotti,et al.  Clip: a Visual Language for Explicit Schema Mappings , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[61]  Carlos Alberto Heuser,et al.  Data Translation Between Taxonomies , 2006, CAiSE.

[62]  Christian Bizer,et al.  The R2R Framework: Publishing and Discovering Mappings on the Web , 2010, COLD.

[63]  Wang Chiew Tan,et al.  SPIDER: a schema mapPIng DEbuggeR , 2006, VLDB.