Investigating the Semantic Gap through Query Log Analysis

Significant efforts have focused in the past years on bringing large amounts of metadata online and the success of these efforts can be seen by the impressive number of web sites exposing data in RDFa or RDF/XML. However, little is known about the extent to which this data fits the needs of ordinary web users with everyday information needs. In this paper we study what we perceive as the semantic gap between the supply of data on the Semantic Web and the needs of web users as expressed in the queries submitted to a major Web search engine. We perform our analysis on both the level of instances and ontologies. First, we first look at how much data is actually relevant to Web queries and what kind of data is it. Second, we provide a generic method to extract the attributes that Web users are searching for regarding particular classes of entities. This method allows to contrast class definitions found in Semantic Web vocabularies with the attributes of objects that users are interested in. Our findings are crucial to measuring the potential of semantic search, but also speak to the state of the Semantic Web in general.

[1]  Yong Yu,et al.  An Unsupervised Model for Exploring Hierarchical Semantics from Social Annotations , 2007, ISWC/ASWC.

[2]  Andreas Hotho,et al.  Discovering shared conceptualizations in folksonomies , 2008, J. Web Semant..

[3]  Aristides Gionis,et al.  The impact of caching on search engines , 2007, SIGIR.

[4]  Li Ding,et al.  How the Semantic Web is Being Used: An Analysis of FOAF Documents , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[5]  Enrico Motta,et al.  The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6-10, 2005, Proceedings , 2005, SEMWEB.

[6]  Dean Allemang,et al.  The Semantic Web - ISWC 2006, 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5-9, 2006, Proceedings , 2006, SEMWEB.

[7]  M. Hausenblas,et al.  What is the Size of the Semantic Web ? , 2008 .

[8]  Enrico Motta,et al.  Characterizing Knowledge on the Semantic Web with Watson , 2007, EON.

[9]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[10]  Enrico Motta,et al.  Folksonomy Enrichment and Search , 2009, ESWC.

[11]  Li Ding,et al.  Characterizing the Semantic Web on the Web , 2006, SEMWEB.

[12]  Ricardo Baeza-Yates,et al.  Clique Analysis of Query Log Graphs , 2008, SPIRE.

[13]  Eyal Oren,et al.  Sindice.com: Weaving the Open Linked Data , 2007, ISWC/ASWC.

[14]  Jian-Yun Nie,et al.  Adapting information retrieval to query contexts , 2008, Inf. Process. Manag..

[15]  Andreas Hotho,et al.  Logsonomy - social information retrieval with logdata , 2008, Hypertext.

[16]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[17]  Ciro Cattuto,et al.  Proceedings of the 20th ACM conference on Hypertext and hypermedia , 2009 .