Linked Open Data Validity - A Technical Report from ISWS 2018

Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfied by a URI and accessible via HTTP. LOD encodes globalscale knowledge potentially available to any human as well as artificial intelligence that may want to benefit from it as background knowledge for supporting their tasks. LOD has emerged as the backbone of applications in diverse fields such as Natural Language Processing, Information Retrieval, Computer Vision, Speech Recognition, and many more. Nevertheless, regardless of the specific tasks that LOD-based tools aim to address, the reuse of such knowledge may be challenging for diverse reasons, e.g. semantic heterogeneity, provenance, and data quality. As aptly stated by Heath et al. Linked Data might be outdated, imprecise, or simply wrong": there arouses a necessity to investigate the problem of linked data validity. This work reports a collaborative effort performed by nine teams of students, guided by an equal number of senior researchers, attending the International Semantic Web Research School (ISWS 2018) towards addressing such investigation from different perspectives coupled with different approaches to tackle the issue.

Mathias Bonduel | Aldo Gangemi | Sebastian Rudolph | David Chaves-Fraga | Lucie-Aimée Kaffee | Maria-Esther Vidal | Mehwish Alam | Valentina Presutti | Harald Sack | Marieke van Erp | Claudia d'Amato | Ettore Rizza | Sven Lieber | Elena Camossi | Shruthi Chari | Amina Annane | Tabea Tietz | Quentin Brabant | Swati Padhee | Ludovica Marinucci | Prashant Khare | Pasquale Lisena | Michael Cochez | Valentina Anita Carriero | Dalia Varanka | Ilkcan Keles | Thiviyan Thanapalasingam | Nyoman Juniarta | Subhi Issa | Maximilian Zocholl | Luca Sciullo | Lars Heling | Thomas Minier | Sylwia Ozdowska | Russa Biswas | Benjamin Moreau | Vincenzo Cutrona | Humasak Simanjuntak | Durgesh Nandini | Tatiana Makhalova | Rahma Dandan | Giuseppe Futia | Valentina Leone | Roberto Reda | Noura Herradi | Amr Azzam | Viktor Kovtun | Siying Li | Arnaud Grall | Samaneh Jozashoori | Ahmed El Amine Djebri | Tatiana P. Makhalova | Valerio Di Carlo | Fiorela Ciroku | Michael Wolowyk | Carlo Stomeo | Alberto Moya Loustaunau | Amanda Pacini de Moura | Faiq Miftakhul Falakh | Tayeb Abderrahmani Ghor | Esha Agrawal | Omar Alqawasmeh | Andrew Berezovskyi | Cristina-iulia Bucur | Hubert Curien | Danilo Dess | Alba Fernndez Izquierdo | Simone Gasperoni | Pierre Henri | Guillermo Palma | Pedro Del Pozo Jimnez | Henry Rosales-mndez | S. Rudolph | Aldo Gangemi | Mehwish Alam | V. Presutti | Maria-Esther Vidal | Claudia d’Amato | M. Erp | Prashant Khare | S. Ozdowska | D. Varanka | L. Sciullo | David Chaves-Fraga | Valerio Di Carlo | Amr Azzam | Michael Cochez | H. Curien | Shruthi Chari | Henry Rosales-mndez | Giuseppe Futia | C. Bucur | D. Nandini | Noura Herradi | Fiorela Ciroku | Maximilian Zocholl | M. Wolowyk | Ludovica Marinucci | Swati Padhee | Pasquale Lisena | S. Jozashoori | Lars Heling | Roberto Reda | S. Issa | Russa Biswas | E. Camossi | Nyoman Juniarta | Thiviyan Thanapalasingam | Humasak Simanjuntak | S. Gasperoni | M. Cochez | Quentin Brabant | Siying Li | F. M. Falakh | Valentina Leone | S. Lieber | Harald Sack | Lucie-Aimée Kaffee | Guillermo Palma | Tabea Tietz | Ilkcan Keles | Amina Annane | Viktor Kovtun | Mathias Bonduel | Esha Agrawal | Omar Alqawasmeh | Andrew Berezovskyi | Vincenzo Cutrona | Rahma Dandan | Danilo Dess | Arnaud Grall | Pierre Henri | Thomas Minier | Benjamin Moreau | Amanda Pacini de Moura | Ettore Rizza | Carlo Stomeo | Vincenzo Cutrona | H. Simanjuntak

[1]  Jeff Z. Pan,et al.  Computing Authoring Tests from Competency Questions: Experimental Validation , 2017, International Semantic Web Conference.

[2]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  José Creissac Campos,et al.  Application of Ontologies in Identifying Requirements Patterns in Use Cases , 2014, FESCA.

[5]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[6]  Ivar Jacobson,et al.  Object-oriented software engineering - a use case driven approach , 1993, TOOLS.

[7]  Ramanathan V. Guha,et al.  Contexts for the Semantic Web , 2004, SEMWEB.

[8]  Steffen Staab,et al.  Federated Data Management and Query Optimization for Linked Open Data , 2011, New Directions in Web Data Management 1.

[9]  Peter F. Patel-Schneider,et al.  Using Description Logics for RDF Constraint Checking and Closed-World Recognition , 2014, AAAI.

[10]  Sumit Bhatia,et al.  Know Thy Neighbors, and More!: Studying the Role of Context in Entity Recommendation , 2018, HT.

[11]  Robert Stevens,et al.  Towards Competency Question-Driven Ontology Authoring , 2014, ESWC.

[12]  Luciano Serafini,et al.  Contextualized knowledge repositories for the Semantic Web , 2012, J. Web Semant..

[13]  Nicola Guarino,et al.  Sweetening WORDNET with DOLCE , 2003, AI Mag..

[14]  Charles Møller,et al.  Encyclopedia of Information Science and Technology , 2005 .

[15]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[16]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[17]  Nicola Fanizzi,et al.  Tree-based models for inductive classification on the Web Of Data , 2017, J. Web Semant..

[18]  Heiko Paulheim,et al.  RDF2Vec: RDF Graph Embeddings for Data Mining , 2016, SEMWEB.

[19]  Lora Aroyo,et al.  Linking Trust to Data Quality , 2015 .

[20]  P. Davies The American heritage dictionary of the English language , 1981 .

[21]  Amit P. Sheth,et al.  Harnessing relationships for domain-specific subgraph extraction: A recommendation use case , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[22]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[23]  Stephan Winter,et al.  Locating place names from place descriptions , 2013, Int. J. Geogr. Inf. Sci..

[24]  Amit P. Sheth,et al.  Domain-specific hierarchical subgraph extraction: A recommendation use case , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[25]  Maria-Esther Vidal,et al.  Decomposing federated queries in presence of replicated fragments , 2017, J. Web Semant..

[26]  Wan Fokkink,et al.  Estimating Uncertainty of Categorical Web Data , 2011, URSW.

[27]  Mark S. Fox,et al.  The Role of Competency Questions in Enterprise Engineering , 1995 .

[28]  Luciano Serafini,et al.  Modeling Contextualized Knowledge , 2010, CIAO@EKAW.

[29]  Heiko Paulheim,et al.  Global RDF Vector Space Embeddings , 2017, SEMWEB.

[30]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[31]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[32]  Luciano Serafini,et al.  Contextual Representation and Reasoning with Description Logics , 2011, Description Logics.

[33]  Olaf Hartig,et al.  Publishing and Consuming Provenance Metadata on the Web of Linked Data , 2010, IPAW.

[34]  William Morris The American Heritage dictionary of the English language , 1969 .

[35]  Emmanuel Müller,et al.  Focused clustering and outlier detection in large attributed graphs , 2014, KDD.

[36]  Quoc V. Le,et al.  A Simple Method for Commonsense Reasoning , 2018, ArXiv.

[37]  Heiko Paulheim,et al.  Adoption of the Linked Data Best Practices in Different Topical Domains , 2014, SEMWEB.

[38]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[39]  John Domingue,et al.  The Blockchain and Kudos: A Distributed System for Educational Record, Reputation and Reward , 2016, EC-TEL.

[40]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[41]  Maribel Acosta,et al.  ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints , 2011, SEMWEB.

[42]  Marieke van Erp,et al.  Georeferencing Animal Specimen Datasets , 2015, Trans. GIS.

[43]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[44]  Maria-Esther Vidal,et al.  Federated SPARQL Queries Processing with Replicated Fragments , 2015, International Semantic Web Conference.

[45]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[46]  Georg Göbel,et al.  Semi-Automated Evaluation of Biomedical Ontologies for the Biobanking Domain Based on Competency Questions , 2015, eHealth.

[47]  Jens Lehmann,et al.  Test-driven evaluation of linked data quality , 2014, WWW.

[48]  John Domingue,et al.  LinkChains: Exploring the Space of Decentralised Trustworthy Linked Data , 2017, DeSemWeb@ISWC.

[49]  Rui Zhang,et al.  Incorporating Knowledge Graph Embeddings into Topic Modeling , 2017, AAAI.

[50]  Steffen Staab,et al.  Ontology enrichment by discovering multi-relational association rules from ontological knowledge bases , 2016, SAC.

[51]  Moritz Tenorth,et al.  Putting People's Common Sense into Knowledge Bases of Household Robots , 2010, KI.

[52]  Bob Jansen CONTEXT : A REAL PROBLEM FOR LARGE AND SHAREABLE KNOWLEDGE BASES , 1993 .

[53]  Georg Lausen,et al.  RDF Constraint Checking , 2015, EDBT/ICDT Workshops.

[54]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[55]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[56]  Claudia d'Amato,et al.  Evolutionary Discovery of Multi-relational Association Rules from Ontological Knowledge Bases , 2016, EKAW.

[57]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[58]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[59]  Andrew McCallum,et al.  Information Extraction , 2005, ACM Queue.

[60]  Fabian M. Suchanek,et al.  Fast rule mining in ontological knowledge bases with AMIE+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+$$\end{docu , 2015, The VLDB Journal.

[61]  Paolo Ciancarini,et al.  Empirical Analysis of Foundational Distinctions in the Web of Data , 2018, IJCAI 2018.

[62]  Oleksiy Khriyenko,et al.  A framework for context-sensitive metadata description , 2006, Int. J. Metadata Semant. Ontologies.

[63]  Jens Lehmann,et al.  DL-Learner: Learning Concepts in Description Logics , 2009, J. Mach. Learn. Res..

[64]  Marco Marengo,et al.  Semantic Annotation and Classification in Practice , 2015, IT Professional.

[65]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[66]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[67]  Jens Lehmann,et al.  DL-Learner - A framework for inductive learning on the Semantic Web , 2016, J. Web Semant..

[68]  Diego Reforgiato Recupero,et al.  Annotated RDF , 2006, ESWC.

[69]  Jens Lehmann,et al.  ORE - A Tool for Repairing and Enriching Knowledge Bases , 2010, SEMWEB.

[70]  Nicola Fanizzi,et al.  DL-FOIL Concept Learning in Description Logics , 2008, ILP.

[71]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[72]  Wassim Jaziri,et al.  How to Repair Inconsistency in OWL 2 DL Ontology Versions? , 2018, Data Knowl. Eng..

[73]  Marc Pilkington,et al.  Blockchain Technology: Principles and Applications , 2015 .

[74]  Varol Akman,et al.  Steps Toward Formalizing Context , 1996, AI Mag..

[75]  Markus Nentwig,et al.  A survey of current Link Discovery frameworks , 2016, Semantic Web.

[76]  Claire Grover,et al.  Use of the Edinburgh geoparser for georeferencing digitized historical collections , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[77]  Christian Bizer,et al.  Quality-driven information filtering using the WIQA policy framework , 2009, J. Web Semant..

[78]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[79]  Maribel Acosta,et al.  HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowdsourcing , 2017, J. Web Semant..

[80]  James Cheney,et al.  The W3C PROV family of specifications for modelling provenance metadata , 2013, EDBT '13.

[81]  Mónica Marrero,et al.  Named Entity Recognition: Fallacies, challenges and opportunities , 2013, Comput. Stand. Interfaces.

[82]  Maria-Esther Vidal,et al.  MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates , 2017, DEXA.

[83]  Jiao Tao,et al.  Integrity Constraints in OWL , 2010, AAAI.

[84]  Anisa Rula,et al.  Methodology for Assessment of Linked Data Quality , 2014, LDQ@SEMANTICS.

[85]  André Freitas,et al.  Word Tagging with Foundational Ontology Classes: Extending the WordNet-DOLCE Mapping to Verbs , 2016, EKAW.

[86]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[87]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[88]  Sumit Bhatia,et al.  Tell Me Why Is It So? Explaining Knowledge Graph Relationships by Finding Descriptive Support Passages , 2018, ArXiv.