A Query-Driven Characterization of Linked Data

Due to the Linked Data initiative, the once unpopulated Semantic Web is now rapidly being populated with millions of facts stored in RDF. Could any of this data possibly be interesting to ordinary users? In this study, we run queries extracted from a query log from a major hypertext search engine against a Semantic Web search engine to determine if the Semantic Web has anything of interest to the average Web user. There is indeed much Semantic Web information that could be relevant for many queries for entities (like people and places) and abstract concepts, although these possibly relevant results are overwhelmingly clustered around DBPedia. We present an empirical analysis of the results, focusing on their major sources, the structure of the triples, the use of various RDF and OWL constructs, and the power-law distributions produced by both the URIs that serve Linked Data and the URIs in the triples themselves. The issue of 303 redirection and the URI identiy is given in-depth treatment.

[1]  M. Hausenblas,et al.  What is the Size of the Semantic Web ? , 2008 .

[2]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[3]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[4]  Tom Heath,et al.  How to Publish Linked Data on the Web - Proposal for a Half-day Tutorial at ISWC2008 , 2008 .

[5]  Hugh Glaser,et al.  RKBExplorer.com: A Knowledge Driven Infrastructure for Linked Data Providers , 2008, ESWC.

[6]  Amanda Spink,et al.  Determining the informational, navigational, and transactional intent of Web queries , 2008, Inf. Process. Manag..

[7]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[8]  Marc Moens,et al.  Description of the LTG System Used for MUC-7 , 1998, MUC.

[9]  Steven Pemberton,et al.  RDFa in XHTML: Syntax and Processing , 2008 .

[10]  George Sugihara,et al.  Complex systems: Ecology for bankers , 2008, Nature.

[11]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[12]  R. B. Redmon,et al.  Identity , 2021, Notre Dame J. Formal Log..

[13]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[14]  Ramanathan V. Guha,et al.  Cyc: toward programs with common sense , 1990, CACM.

[15]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.

[16]  G. Beged-Dov RDF Site Summary (RSS) 1.0 , 2001 .

[17]  Lyle H. Ungar,et al.  Web-scale named entity recognition , 2008, CIKM '08.

[18]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[19]  Yuzhong Qu,et al.  Falcons: searching and browsing entities on the semantic web , 2008, WWW.

[20]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[21]  Li Ding,et al.  Characterizing the Semantic Web on the Web , 2006, SEMWEB.

[22]  Lalana Kagal,et al.  The Fractal Nature of the Semantic Web , 2008, AI Mag..

[23]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[24]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Ricardo A. Baeza-Yates,et al.  The Intention Behind Web Queries , 2006, SPIRE.

[26]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[27]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[28]  Patrick Alan Danaher How to publish , 2006 .

[29]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[30]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[31]  Patrick J. Hayes,et al.  In Defense of Ambiguity , 2008, Int. J. Semantic Web Inf. Syst..

[32]  Markus Krötzsch,et al.  Semantic Wikipedia , 2006, WikiSym '06.

[33]  Marius Pasca,et al.  Weakly-supervised discovery of named entities using web search queries , 2007, CIKM '07.

[34]  Leo Sauermann,et al.  Cool URIs for the semantic web , 2007 .

[35]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[36]  Arvind Malhotra,et al.  Xml schema part 2: datatypes , 1999 .

[37]  Peter Bailey,et al.  Overview of the TREC-8 Web Track , 2000, TREC.

[38]  Sean Bechhofer,et al.  SKOS Simple Knowledge Organization System Reference , 2009 .

[39]  Paolo Bouquet,et al.  OKKAM: Enabling a Web of Entities , 2007, I3.

[40]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.