Alignment and dataset identification of linked data in Semantic Web

The Linked Open Data (LOD) cloud has gained significant attention in the Semantic Web community over the past few years. With rapid expansion in size and diversity, it consists of over 800 interlinked datasets with over 60 billion triples. These datasets encapsulate structured data and knowledge spanning over varied domains such as entertainment, life sciences, publications, geography, and government. Applications can take advantage of this by using the knowledge distributed over the interconnected datasets, which is not realistic to find in a single place elsewhere. However, two of the key obstacles in using the LOD cloud are the limited support for data integration tasks over concepts, instances, and properties, and relevant data source selection for querying over multiple datasets. We review, in brief, some of the important and interesting technical approaches found in the literature that address these two issues. We observe that the general purpose alignment techniques developed outside the LOD context fall short in meeting the heterogeneous data representation of LOD. Therefore, an LOD‐specific review of these techniques (especially for alignment) is important to the community. The topics covered and discussed in this article fall under two broad categories, namely alignment techniques for LOD datasets and relevant data source selection in the context of query processing over LOD datasets.

[1]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[2]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[3]  Amit P. Sheth,et al.  Ontology Alignment for Linked Open Data , 2010, SEMWEB.

[4]  Jürgen Umbrich,et al.  Data summaries for on-demand queries over linked data , 2010, WWW '10.

[5]  Cosmin Stroe,et al.  AgreementMaker: Efficient Matching for Large Real-World Schemas and Ontologies , 2009, Proc. VLDB Endow..

[6]  A. Scherp,et al.  SchemEX — Web-Scale Indexed Schema Extraction of Linked Open Data ( BTC Submission ) , 2011 .

[7]  Natalya F. Noy,et al.  A Guide to Creating Your First Ontology , 2002 .

[8]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[9]  Katja Hose,et al.  Processing Rank-Aware Queries in P2P Systems , 2005, DBISP2P.

[10]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[11]  Felix Naumann,et al.  Holistic and Scalable Ontology Alignment for Linked Open Data , 2012, LDOW.

[12]  Karl Aberer,et al.  Databases, Information Systems, and Peer-to-Peer Computing , 2003, Lecture Notes in Computer Science.

[13]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[14]  Jan Hidders,et al.  SERIMI - resource description similarity, RDF instance matching and interlinking , 2011, OM.

[15]  Eero Hyvönen,et al.  DataFinland - A Semantic Portal for Open and Linked Datasets , 2011, ESWC.

[16]  Steffen Staab,et al.  What Is an Ontology? , 2009, Handbook on Ontologies.

[17]  Erhard Rahm,et al.  Schema Matching and Mapping , 2013, Schema Matching and Mapping.

[18]  Jürgen Umbrich,et al.  Freshening up while Staying Fast: Towards Hybrid SPARQL Queries , 2012, EKAW.

[19]  Jeff Heflin,et al.  Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach , 2011, SEMWEB.

[20]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[21]  Robert Isele,et al.  Learning Expressive Linkage Rules using Genetic Programming , 2012, Proc. VLDB Endow..

[22]  Hua Li,et al.  Opaque Attribute Alignment , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[23]  W. N. Borst,et al.  Construction of Engineering Ontologies for Knowledge Sharing and Reuse , 1997 .

[24]  Sören Auer,et al.  Question answering on interlinked data , 2013, WWW.

[25]  Isabelle Augenstein,et al.  Statistical Knowledge Patterns: Identifying Synonymous Relations in Large Linked Datasets , 2013, International Semantic Web Conference.

[26]  Amit P. Sheth,et al.  Alignment-Based Querying of Linked Open Data , 2012, OTM Conferences.

[27]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[28]  Bernadette Farias Lóscio,et al.  Feedback-based data set recommendation for building linked data applications , 2012, I-SEMANTICS '12.

[29]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[30]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[31]  Axel-Cyrille Ngonga Ngomo,et al.  Link Discovery with Guaranteed Reduction Ratio in Affine Spaces with Minkowski Measures , 2012, SEMWEB.

[32]  Robert Isele,et al.  Active learning of expressive linkage rules using genetic programming , 2013, J. Web Semant..

[33]  Robert Isele,et al.  Efficient Multidimensional Blocking for Link Discovery without losing Recall , 2011, WebDB.

[34]  Bao-Quoc Ho,et al.  Cluster-based similarity aggregation for ontology matching , 2011, OM.

[35]  Craig A. Knoblock,et al.  Discovering Concept Coverings in Ontologies of Linked Data Sources , 2012, International Semantic Web Conference.

[36]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[37]  Amit P. Sheth,et al.  Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study with Proton , 2011, ESWC.

[38]  Patrick J. Hayes,et al.  When owl: sameAs isn't the Same: An Analysis of Identity Links on the Semantic Web , 2010, LDOW.

[39]  Enrico Motta,et al.  Scaling Up Question-Answering to Linked Data , 2010, EKAW.

[40]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[41]  Craig A. Knoblock,et al.  Linking and Building Ontologies of Linked Data , 2010, SEMWEB.

[42]  Enrico Motta,et al.  What Should I Link to? Identifying Relevant Sources and Classes for Data Linking , 2011, JIST.

[43]  Amit P. Sheth,et al.  A statistical and schema independent approach to identify equivalent properties on linked data , 2013, I-SEMANTICS '13.

[44]  Jérôme David,et al.  Matching directories and OWL ontologies with AROMA , 2006, CIKM '06.

[45]  Günter Ladwig,et al.  Linked Data Query Processing Strategies , 2010, SEMWEB.

[46]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[47]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[48]  Amit P. Sheth,et al.  Moving beyond SameAs with PLATO: partonomy detection for linked data , 2012, HT '12.

[49]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[50]  Amit P. Sheth,et al.  Types of Property Pairs and Alignment on Linked Datasets - A Preliminary Analysis , 2013, I-SEMANTICS.

[51]  Fausto Giunchiglia,et al.  S-Match: an Algorithm and an Implementation of Semantic Matching , 2004, ESWS.

[52]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[53]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[54]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[55]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[56]  Deborah L. McGuinness,et al.  When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[57]  Atanas Kiryakov,et al.  D1.8.1 Base upper-level ontology (BULO) Guidance 1 , 2005 .

[58]  Stefan Decker,et al.  Sig.ma: live views on the web of data , 2010, WWW '10.

[59]  Steffen Staab,et al.  TripleRank: Ranking Semantic Web Data by Tensor Decomposition , 2009, SEMWEB.

[60]  Enrico Motta,et al.  Capturing Emerging Relations between Schema Ontologies on the Web of Data , 2010, COLD.

[61]  Amit P. Sheth,et al.  Automatic Domain Identification for Linked Open Data , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[62]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[63]  Nigel Shadbolt,et al.  Statistical Analysis of the owl: sameAs Network for Aligning Concepts in the Linking Open Data Cloud , 2012, DEXA.

[64]  Vipul Kashyap,et al.  Relationships at the Heart of Semantic Web: Modeling, Discovering, and Exploiting Complex Semantic Relationships , 2004 .

[65]  Ryutaro Ichise,et al.  Graph-based ontology analysis in the linked open data , 2012, I-SEMANTICS '12.

[66]  Axel-Cyrille Ngonga Ngomo,et al.  COALA - Correlation-Aware Active Learning of Link Specifications , 2013, ESWC.

[67]  Katja Hose,et al.  FedX: A Federation Layer for Distributed Query Processing on Linked Open Data , 2011, ESWC.

[68]  Stefan Decker,et al.  Sig.ma: Live views on the Web of Data , 2010, J. Web Semant..