The Linked Data cloud contains large amounts of RDF data generated from databases. Much of this RDF data, generated using tools such as D2R, is expressed in terms of vocabularies automatically derived from the schema of the original database. The generated RDF would be significantly more useful if it were expressed in terms of commonly used vocabularies. Using today’s tools, it is labor-intensive to do this. For example, one can first use D2R to automatically generate RDF from a database and then use R2R to translate the automatically generated RDF into RDF expressed in a new vocabulary. The problem is that defining the R2R mappings is difficult and labor intensive because one needs to write the mapping rules in terms of SPARQL graph patterns. In this work, we present a semi-automatic approach for building mappings that translate data in structured sources to RDF expressed in terms of a vocabulary of the user’s choice. Our system, Karma, automatically derives these mappings, and provides an easy to use interface that enables users to control the automated process to guide the system to produce the desired mappings. In our evaluation, users need to interact with the system less than once per column (on average) in order to construct the desired mapping rules. The system then uses these mapping rules to generate semantically rich RDF for the data sources. We demonstrate Karma using a bioinformatics example and contrast it with other approaches used in that community. Bio2RDF [7] and Semantic MediaWiki Linked Data Extension (SMW-LDE) [2] are examples of efforts that integrate bioinformatics datasets by mapping them to a common vocabulary. We applied our approach to a scenario used in the SMW-LDE that integrate ABA, Uniprot, KEGG Pathway, PharmGKB and Linking Open Drug Data datasets using a
[1]
Kristina Lerman,et al.
Using Conditional Random Fields to Exploit Token Structure and Labels for Accurate Semantic Annotation
,
2011,
AAAI.
[2]
Craig A. Knoblock,et al.
Building Mashups by Demonstration
,
2011,
TWEB.
[3]
Christian Becker,et al.
Extending SMW+ with a Linked Data Integration Framework
,
2010,
ISWC Posters&Demos.
[4]
Kristina Lerman,et al.
Semi-automatically Mapping Structured Sources into the Semantic Web
,
2012,
ESWC.
[5]
Eric Yu,et al.
Conceptual Modeling: Foundations and Applications
,
2009
.
[6]
Peter Ansell,et al.
Model and prototype for querying multiple linked scientific datasets
,
2011,
Future Gener. Comput. Syst..
[7]
Andrew McCallum,et al.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
,
2001,
ICML.
[8]
Laura M. Haas,et al.
Clio: Schema Mapping Creation and Data Exchange
,
2009,
Conceptual Modeling: Foundations and Applications.
[9]
Asunción Gómez-Pérez,et al.
Upgrading relational legacy data to the semantic web
,
2006,
WWW '06.
[10]
Lora Aroyo,et al.
The Semantic Web: Research and Applications
,
2009,
Lecture Notes in Computer Science.