ERD 2014 was a research challenge focused on the task of recognition and disambiguation of knowledge base entities in short and long texts. This write-up describes Magnetic-IISAS team's approach to the entity recognition in search queries with which we have participated in ERD 2014 challenge. Our approach combines techniques of information retrieval, gazetteer based annotation and entity link graph analysis to identify and disambiguate candidate entities. We built a search index with multiple structured fields extracted from Wikipedia, Freebase and DBPedia. When processing a query, we first retrieve top matching entities from the index. For all retrieved entities, we gather plausible verbalizations, surface forms, that retrieved entities may be referred to with. We match gathered entity surface forms against the original query to confirm the entity relevance to the query. Finally, we exploit Wikipedia link graph to asses the similarity of candidate entities for the purpose of disambiguation and further candidate filtering. In the paper we discuss successful as well as unsuccessful attempts to improve the quality of system results that we have tried during the course of the challenge.
[1]
Marek Ciglan.
Towards entity search : Research roadmap
,
2013
.
[2]
Krisztian Balog,et al.
On the Modeling of Entities for Ad-Hoc Entity Search in the Web of Data
,
2012,
ECIR.
[3]
John D. Lafferty,et al.
A study of smoothing methods for language models applied to Ad Hoc information retrieval
,
2001,
SIGIR '01.
[4]
Hugo Zaragoza,et al.
The Probabilistic Relevance Framework: BM25 and Beyond
,
2009,
Found. Trends Inf. Retr..
[5]
Ladislav Hluchý,et al.
The SemSets model for ad-hoc semantic list search
,
2012,
WWW.