论文信息 - Learning to rank for quantity consensus queries

Learning to rank for quantity consensus queries

Web search is increasingly exploiting named entities like persons, places, businesses, addresses and dates. Entity ranking is also of current interest at INEX and TREC. Numerical quantities are an important class of entities, especially in queries about prices and features related to products, services and travel. We introduce Quantity Consensus Queries (QCQs), where each answer is a tight quantity interval distilled from evidence of relevance in thousands of snippets. Entity search and factoid question answering have benefited from aggregating evidence from multiple promising snippets, but these do not readily apply to quantities. Here we propose two new algorithms that learn to aggregate information from multiple snippets. We show that typical signals used in entity ranking, like rarity of query words and their lexical proximity to candidate quantities, are very noisy. Our algorithms learn to score and rankquantity intervals directly, combining snippet quantity and snippet text information. We report on experiments using hundreds of QCQs with ground truth taken from TREC QA, Wikipedia Infoboxes, and other sources, leading to tens of thousands of candidate snippets and quantities. Our algorithms yield about 20% better MAP and NDCG compared to the best-known collective rankers, and are 35% better than scoring snippets independent of each other.

Somnath Banerjee | Ganesh Ramakrishnan | Soumen Chakrabarti

[1] Alistair Moffat,et al. Effective document presentation with a locality-based similarity heuristic , 1999, SIGIR '99.

[2] Tao Qin,et al. LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[3] Charles L. A. Clarke,et al. Exploiting redundancy in question answering , 2001, SIGIR '01.

[4] Veronique Moriceau. Numerical Data Integration for Cooperative Question-Answering , 2006 .

[5] M. de Rijke,et al. A language modeling framework for expert finding , 2009, Inf. Process. Manag..

[6] Tie-Yan Liu,et al. Learning to rank for information retrieval (LR4IR 2008) , 2008, SIGF.

[7] Amélie Marian,et al. Corroborating Answers from Multiple Web Sources , 2007, WebDB.

[8] Luo Si,et al. A probabilistic graphical model for joint answer ranking in question answering , 2007, SIGIR.

[9] Jennifer Chu-Carroll,et al. Type nanotheories: a framework for term comparison , 2007, CIKM '07.

[10] Daniel S. Weld,et al. Autonomously semantifying wikipedia , 2007, CIKM '07.

[11] Oren Etzioni,et al. Structured querying of web text , 2007 .