A Query-Driven Characterization of Linked Data

Due to the Linked Data initiative, the once unpopulated Semantic Web is now rapidly being populated with millions of facts stored in RDF. Could any of this data possibly be interesting to ordinary users? In this study, we run queries extracted from a query log from a major hypertext search engine against a Semantic Web search engine to determine if the Semantic Web has anything of interest to the average Web user. There is indeed much Semantic Web information that could be relevant for many queries for entities (like people and places) and abstract concepts, although these possibly relevant results are overwhelmingly clustered around DBPedia. We present an empirical analysis of the results, focusing on their major sources, the structure of the triples, the use of various RDF and OWL constructs, and the power-law distributions produced by both the URIs that serve Linked Data and the URIs in the triples themselves. The issue of 303 redirection and the URI identiy is given in-depth treatment.

