Entity retrieval has seen a lot of interest from the research community over the past decade. Ten years ago, the expertise retrieval task gained popularity in the research community during the TREC Enterprise Track [10]. It has remained relevant ever since, while broadening to social media, to tracking the dynamics of expertise [1-5, 8, 11], and, more generally, to a range of entity retrieval tasks. In the talk, which will be given by the second author, we will point out that existing methods to entity or expert retrieval fail to address key challenges: (1) Queries and expert documents use different representations to describe the same concepts [6, 7]. Term mismatches between entities and experts [7] occur due to the inability of widely used maximum-likelihood language models to make use of semantic similarities between words [9]. (2) As the amount of available data increases, the need for more powerful approaches with greater learning capabilities than smoothed maximum-likelihood language models is obvious [13]. (3) Supervised methods for entity or expertise retrieval [5, 8] were introduced at the turn of the last decade. However, the acceleration of data availability has the major disadvantage that, in the case of supervised methods, manual annotation efforts need to sustain a similar order of growth. This calls for the further development of unsupervised methods. (4) According to some entity or expertise retrieval methods, a language model is constructed for every document in the collection. These methods lack efficient query capabilities for large document collections, as each query term needs to be matched against every document [2]. In the talk we will discuss a recently proposed solution [12] that has a strong emphasis on unsupervised model construction, efficient query capabilities and, most importantly, semantic matching between query terms and candidate entities. We show that the proposed approach improves retrieval performance compared to generative language models mainly due to its ability to perform semantic matching [7]. The proposed method does not require any annotations or supervised relevance judgments and is able to learn from raw textual evidence and document-candidate associations alone. The purpose of the proposal is to provide insight in how we avoid explicit annotations and feature engineering and still obtain semantically meaningful retrieval results. In the talk we will provide a comparative error analysis between the proposed semantic entity retrieval model and traditional generative language models that perform exact matching, which yields important insights in the relative strengths of semantic matching and exact matching for the expert retrieval task in particular and entity retrieval in general. We will also discuss extensions of the proposed model that are meant to deal with scalability and dynamic aspects of entity and expert retrieval.
[1]
Hang Li,et al.
Semantic Matching in Search
,
2014,
SMIR@SIGIR.
[2]
Yi Fang,et al.
Modeling the dynamics of personal expertise
,
2014,
SIGIR.
[3]
M. de Rijke,et al.
A language modeling framework for expert finding
,
2009,
Inf. Process. Manag..
[4]
Geoffrey E. Hinton,et al.
Learning distributed representations of concepts.
,
1989
.
[5]
M. de Rijke,et al.
Expertise Retrieval
,
2012,
Found. Trends Inf. Retr..
[6]
Pável Calado,et al.
Using Rank Aggregation for Expert Search in Academic Digital Libraries
,
2015,
ArXiv.
[7]
Luo Si,et al.
Discriminative models of integrating document evidence and document-candidate associations for expert search
,
2010,
SIGIR '10.
[8]
M. de Rijke,et al.
On the Assessment of Expertise Profiles
,
2013,
DIR.
[9]
Marcel Worring,et al.
Unsupervised, Efficient and Semantic Expertise Retrieval
,
2016,
WWW.
[10]
Vladimir Vapnik,et al.
Statistical learning theory
,
1998
.
[11]
Geoffrey E. Hinton,et al.
Semantic hashing
,
2009,
Int. J. Approx. Reason..
[12]
David van Dijk,et al.
Early Detection of Topical Expertise in Community Question Answering
,
2015,
SIGIR.