Structuring Linked Data Search Results Using Probabilistic Soft Logic

On-the-fly generation of integrated representations of Linked Data (LD) search results is challenging because it requires successfully automating a number of complex subtasks, such as structure inference and matching of both instances and concepts, each of which gives rise to uncertain outcomes. Such uncertainty is unavoidable given the semantically heterogeneous nature of web sources, including LD ones. This paper approaches the problem of structuring LD search results as an evidence-based one. In particular, the paper shows how one formalism (viz., probabilistic soft logic (PSL)) can be exploited to assimilate different sources of evidence in a principled way and to beneficial effect for users. The paper considers syntactic evidence derived from matching algorithms, semantic evidence derived from LD vocabularies, and user evidence, in the form of feedback. The main contributions are: sets of PSL rules that model the uniform assimilation of diverse kinds of evidence, an empirical evaluation of how the resulting PSL programs perform in terms of their ability to infer structure for integrating LD search results, and, finally, a concrete example of how populating such inferred structures for presentation to the end user is beneficial, besides enabling the collection of feedback whose assimilation further improves search result presentation.

[1]  Jayant Madhavan,et al.  Web-Scale Data Integration: You can afford to Pay as You Go , 2007, CIDR.

[2]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[3]  Jayant Madhavan,et al.  Web-Scale Data Integration: You can afford to Pay as You Go , 2007, CIDR.

[4]  Johanna Völker,et al.  Statistical Schema Induction , 2011, ESWC.

[5]  Norman W. Paton,et al.  User Feedback as a First Class Citizen in Information Integration Systems , 2011, CIDR.

[6]  Lise Getoor,et al.  A short introduction to probabilistic soft logic , 2012, NIPS 2012.

[7]  Norman W. Paton,et al.  Pay-as-you-go Data Integration: Experiences and Recurring Themes , 2016, SOFSEM.

[8]  Stephen H. Bach,et al.  Hinge-Loss Markov Random Fields and Probabilistic Soft Logic , 2015, J. Mach. Learn. Res..

[9]  Stefan Decker,et al.  Sig.ma: live views on the web of data , 2010, WWW '10.

[10]  Heiner Stuckenschmidt,et al.  Probabilistic-Logical Web Data Integration , 2011, Reasoning Web.

[11]  Norman W. Paton,et al.  Structure inference for linked data sources using clustering , 2013, EDBT '13.

[12]  Stefan Decker,et al.  Sig.ma: Live views on the Web of Data , 2010, J. Web Semant..

[13]  Nicola Fanizzi,et al.  DL-FOIL Concept Learning in Description Logics , 2008, ILP.

[14]  Li Ding,et al.  Boosting Semantic Web Data Access Using Swoogle , 2005, AAAI.

[15]  Man Zhu,et al.  Ontology Learning from Incomplete Semantic Web Data by BelNet , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[16]  Gavin Brown,et al.  General Terminology Induction in OWL , 2015, OWLED.

[17]  Lise Getoor,et al.  Knowledge Graph Identification , 2013, SEMWEB.

[18]  Yuzhong Qu,et al.  Searching Linked Objects with Falcons: Approach, Implementation and Evaluation , 2009, Int. J. Semantic Web Inf. Syst..

[19]  Brigitte Mathiak,et al.  Object Property Matching Utilizing the Overlap between Imported Ontologies , 2014, ESWC.

[20]  Amit P. Sheth,et al.  A statistical and schema independent approach to identify equivalent properties on linked data , 2013, I-SEMANTICS '13.