EXPLOITING ENTITY SEMANTICS FOR QUERY EXPANSION

Many user queries nowadays contain references to named entities, which has motivated the development of new methods that exploit entity semantics for query expansion. At the same time, Wikipedia has been widely recognized as a large network of named entities, where entity-related articles are organized into a comprehensive hierarchy of categories and present summarized information on these entities in the so-called infoboxes. In this paper, we present a new query expansion method that uses entity semantics derived from Wikipedia. The main appeal of our method is that, differently from previous methods in the literature, it exploits valuable human-refined information available in infoboxes to obtain candidate query expansion terms and to associate entities identified in queries with categories. Indeed, by taking advantage of the semantic structure implicitly provided by infoboxes templates, we leverage well-known term-selection functions, adapting them to deal properly with entities and ultimately improving their accuracy in selecting good query expansion terms. Experimental results show that our method presents gains of 19.05% (from 0.1370 to 0.1631) in terms of MAP and 77.99% (from 0.2381 to 0.4238) in terms of P@10. In addition, the two approaches for obtaining expansion terms based on information found in infoboxes present a better trade-off between quality of results and time required to process the expanded query.