Learning to rank for quantity consensus queries

Web search is increasingly exploiting named entities like persons, places, businesses, addresses and dates. Entity ranking is also of current interest at INEX and TREC. Numerical quantities are an important class of entities, especially in queries about prices and features related to products, services and travel. We introduce Quantity Consensus Queries (QCQs), where each answer is a tight quantity interval distilled from evidence of relevance in thousands of snippets. Entity search and factoid question answering have benefited from aggregating evidence from multiple promising snippets, but these do not readily apply to quantities. Here we propose two new algorithms that learn to aggregate information from multiple snippets. We show that typical signals used in entity ranking, like rarity of query words and their lexical proximity to candidate quantities, are very noisy. Our algorithms learn to score and rankquantity intervals directly, combining snippet quantity and snippet text information. We report on experiments using hundreds of QCQs with ground truth taken from TREC QA, Wikipedia Infoboxes, and other sources, leading to tens of thousands of candidate snippets and quantities. Our algorithms yield about 20% better MAP and NDCG compared to the best-known collective rankers, and are 35% better than scoring snippets independent of each other.

[1]  Alistair Moffat,et al.  Effective document presentation with a locality-based similarity heuristic , 1999, SIGIR '99.

[2]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[3]  Charles L. A. Clarke,et al.  Exploiting redundancy in question answering , 2001, SIGIR '01.

[4]  Veronique Moriceau Numerical Data Integration for Cooperative Question-Answering , 2006 .

[5]  M. de Rijke,et al.  A language modeling framework for expert finding , 2009, Inf. Process. Manag..

[6]  Tie-Yan Liu,et al.  Learning to rank for information retrieval (LR4IR 2008) , 2008, SIGF.

[7]  Amélie Marian,et al.  Corroborating Answers from Multiple Web Sources , 2007, WebDB.

[8]  Luo Si,et al.  A probabilistic graphical model for joint answer ranking in question answering , 2007, SIGIR.

[9]  Jennifer Chu-Carroll,et al.  Type nanotheories: a framework for term comparison , 2007, CIKM '07.

[10]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[11]  Oren Etzioni,et al.  Structured querying of web text , 2007 .

[12]  ChengXiang Zhai,et al.  Probabilistic Models for Expert Finding , 2007, ECIR.

[13]  Oren Etzioni,et al.  Structured Querying of Web Text A Technical Challenge , 2006 .

[14]  Stephen E. Robertson,et al.  A new interpretation of average precision , 2008, SIGIR '08.

[15]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[16]  O. Chapelle Large margin optimization of ranking measures , 2007 .

[17]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[18]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[19]  W. Bruce Croft,et al.  Proximity-based document representation for named entity retrieval , 2007, CIKM '07.

[20]  Kevin Chen-Chuan Chang,et al.  EntityRank: Searching Entities Directly and Holistically , 2007, VLDB.

[21]  Tao Qin,et al.  Learning to rank relational objects and its application to web search , 2008, WWW.

[22]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[23]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[24]  Tie-Yan Liu,et al.  Learning to rank for information retrieval (LR4IR 2007) , 2007, SIGF.