Coreference aware web object retrieval

As user demands become increasingly sophisticated, search engines today are competing in more than just returning document results from the Web. One area of competition is providing web object results from structured data extracted from a multitude of information sources. We address the problem of performing keyword retrieval over a collection of objects containing a large degree of duplication as different Web-based information sources provide descriptions of the same object. We develop a method for coreference aware retrieval that performs topic-specific coreference resolution on retrieved objects in order to improve object search results. Our results demonstrate that coreference has a significant impact on the effectiveness of retrieval in the domain of local search. Our results show that a coreference aware system outperforms naive object retrieval by more than 20% in P5 and P10.

[1]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[3]  Xin Li,et al.  Context sensitive stemming for web search , 2007, SIGIR.

[4]  Alexandros Ntoulas,et al.  Answering web queries using structured data sources , 2009, SIGMOD Conference.

[5]  Andreas Harth,et al.  Performing Object Consolidation on the Semantic Web Data Graph , 2007, I3.

[6]  Andrew McCallum,et al.  A unified approach for schema matching, coreference and canonicalization , 2008, KDD.

[7]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[8]  Ann Q. Gates,et al.  TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .

[9]  Surajit Chaudhuri,et al.  Exploiting web search engines to search structured databases , 2009, WWW '09.

[10]  Roi Blanco,et al.  Evaluating ad-hoc object retrieval , 2010, IWEST@ISWC.

[11]  Peter Mika,et al.  Ad-hoc object retrieval in the web of data , 2010, WWW '10.

[12]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[13]  Ingemar J. Cox,et al.  Risky business: modeling and exploiting uncertainty in information retrieval , 2009, SIGIR.

[14]  Krishna Bharat,et al.  Diversifying web search results , 2010, WWW '10.

[15]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[17]  Lise Getoor,et al.  Query-time entity resolution , 2006, KDD '06.

[18]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[19]  Filip Radlinski,et al.  Redundancy, diversity and interdependent document relevance , 2009, SIGF.

[20]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[21]  S. Robertson The probability ranking principle in IR , 1997 .

[22]  Wei-Ying Ma,et al.  Object-level Vertical Search , 2007, CIDR.

[23]  Xuemin Lin,et al.  Keyword search on structured and semi-structured data , 2009, SIGMOD Conference.

[24]  Justin Zobel,et al.  Redundant documents and search effectiveness , 2005, CIKM '05.

[25]  Edgar Meij,et al.  Investigating the Semantic Gap through Query Log Analysis , 2009, SEMWEB.

[26]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.