Keyword-Driven Resource Disambiguation over RDF Knowledge Bases

Keyword search is the most popular way to access information. In this paper we introduce a novel approach for determining the correct resources for user-supplied queries based on a hidden Markov model. In our approach the user-supplied query is modeled as the observed data and the background knowledge is used for parameter estimation. We leverage the semantic relationships between resources for computing the parameter estimations. In this approach, query segmentation and resource disambiguation are mutually tightly interwoven. First, an initial set of potential segments is obtained leveraging the underlying knowledge base; then, the final correct set of segments is determined after the most likely resource mapping was computed. While linguistic analysis (e.g. named entity, multi-word unit recognition and POS-tagging) fail in the case of keyword-based queries, we will show that our statistical approach is robust with regard to query expression variance. Our experimental results reveal very promising results.

[1]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[2]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[3]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[4]  Bamshad Mobasher,et al.  Personalized recommendation in social tagging systems using hierarchical clustering , 2008, RecSys '08.

[5]  Fuchun Peng,et al.  Unsupervised query segmentation using generative language models and wikipedia , 2008, WWW.

[6]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[7]  Peter Boros,et al.  Query Segmentation for Web Search , 2003, WWW.

[8]  Ryen W. White,et al.  A Simulated Study of Implicit Feedback Models , 2004, ECIR.

[9]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[10]  Deniz Yuret,et al.  Word Sense Disambiguation for Information Retrieval , 1999, AAAI/IAAI.

[11]  K. Pu,et al.  Keyword query cleaning , 2008, Proc. VLDB Endow..

[12]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[13]  Eric Brill,et al.  Man* vs. Machine: A Case Study in Base Noun Phrase Learning , 1999, ACL.

[14]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[15]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[16]  Shui-Lung Chuang,et al.  Towards automatic generation of query taxonomy: a hierarchical query clustering approach , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[17]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[18]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[19]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[20]  Daniel Gayo-Avello,et al.  On the Fly Query Entity Decomposition Using Snippets , 2010, ArXiv.

[21]  Steve Lawrence,et al.  Context in Web Search , 2000, IEEE Data Eng. Bull..

[22]  Xiaohui Yu,et al.  Query segmentation using conditional random fields , 2009, KEYS '09.

[23]  Ravi Kumar,et al.  Searching with context , 2006, WWW '06.

[24]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[25]  Jaime G. Carbonell,et al.  The impact of history length on personalized search , 2008, SIGIR '08.