Context-sensitive ranking

We are witnessing a growing number of applications that involve both structured data and unstructured data. A simple example is academic citations : while the citation's content is unstructured text, the citation is associated with structured data such as author list, categories and publication time. To query such hybrid data, a natural approach is to combine structured queries with keyword search. Two fundamental problems arise for this unique marriage : (1) How to evaluate hybrid queries efficiently? (2) How to model relevance ranking? The second problem is especially difficult, because all the foundations of relevance ranking in information retrieval are built on unstructured text and no structures are considered. We present context-sensitive ranking, a ranking framework that integrates structured queries and relevance ranking. The key insight is that structured queries provide expressive search contexts. The ranking model collects keyword statistics in the contexts and feeds them into conventional ranking formulas to compute ranking scores. The query evaluation challenge is the computation of keyword statistics at runtime, which involves expensive online aggregations. At the core of our solution to overcome the efficiency issue is an innovative reduction from computing keyword statistics to answering aggregation queries. Many statistics, such as document frequency, require aggregations over the data space returned by the structured query. This is analogous to analytical queries in OLAP applications, which involve a large number of aggregations. We leverage and extend the materialized view research in OLAP to deliver algorithms and data structures that evaluate context-sensitive ranking efficiently

[1]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[2]  Sunita Sarawagi,et al.  Information Extraction , 2008 .

[3]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[4]  Inderpal Singh Mumick,et al.  Selection of views to materialize in a data warehouse , 1997, IEEE Transactions on Knowledge and Data Engineering.

[5]  Ricardo A. Baeza-Yates,et al.  Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences , 2005, SPIRE.

[6]  Erik D. Demaine,et al.  Experiments on Adaptive Set Intersections for Text Retrieval Systems , 2001, ALENEX.

[7]  Sihem Amer-Yahia,et al.  Flexible and efficient XML search with complex full-text predicates , 2006, SIGMOD Conference.

[8]  Ralf Schenkel,et al.  Proximity-aware scoring for XML retrieval , 2008, SIGIR '08.

[9]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize Under a Maintenance Cost Constraint , 1999, ICDT.

[10]  Ricardo A. Baeza-Yates,et al.  A Fast Set Intersection Algorithm for Sorted Sequences , 2004, CPM.

[11]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[12]  Elena Baralis,et al.  Materialized Views Selection in a Multidimensional Database , 1997, VLDB.

[13]  Jeffrey F. Naughton,et al.  On the integration of structure indexes and inverted lists , 2004, Proceedings. 20th International Conference on Data Engineering.

[14]  Jian Yang,et al.  Algorithms for Materialized View Design in Data Warehousing Environment , 1997, VLDB.

[15]  Yannis E. Ioannidis,et al.  Hierarchical Prefix Cubes for Range-Sum Queries , 1999, VLDB.

[16]  Sudipto Guha,et al.  Ad-hoc aggregations of ranked lists in the presence of hierarchies , 2008, SIGMOD Conference.

[17]  Gerhard Weikum,et al.  An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[18]  Feng Qiu,et al.  Automatic identification of user interest for personalized search , 2006, WWW '06.

[19]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[20]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[21]  Jiawei Han,et al.  Progressive and selective merge: computing top-k with ad-hoc ranking functions , 2007, SIGMOD '07.

[22]  Alejandro López-Ortiz,et al.  An experimental investigation of set intersection algorithms for text searching , 2010, JEAL.

[23]  Yannis Papakonstantinou,et al.  Supporting top-K keyword search in XML databases , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[24]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[25]  Charles L. A. Clarke,et al.  Controlling overlap in content-oriented XML retrieval , 2005, SIGIR '05.

[26]  Xiao Li,et al.  Extracting structured information from user queries with semi-supervised conditional random fields , 2009, SIGIR.

[27]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[28]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[29]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[30]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[31]  Panayiotis Tsaparas,et al.  Structured annotations of web queries , 2010, SIGMOD Conference.

[32]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[33]  Jayavel Shanmugasundaram,et al.  Context-Sensitive Keyword Search and Ranking for XML , 2005, WebDB.

[34]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[35]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[36]  Hamid Pirahesh,et al.  Answering complex SQL queries using automatic summary tables , 2000, SIGMOD 2000.

[37]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[38]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[39]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[40]  Yong Yu,et al.  Exploring folksonomy for personalized search , 2008, SIGIR '08.

[41]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[42]  Ellen M. Voorhees,et al.  TREC genomics special issue overview , 2009, Information Retrieval.

[43]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[44]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[45]  Man Lung Yiu,et al.  Efficient Aggregation of Ranked Inputs , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[46]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[47]  Fan Yang,et al.  Efficient keyword search over virtual XML views , 2008, The VLDB Journal.

[48]  Armin B. Cremers,et al.  Searching and browsing collections of structural information , 2000, Proceedings IEEE Advances in Digital Libraries 2000.

[49]  Erik D. Demaine,et al.  Adaptive set intersections, unions, and differences , 2000, SODA '00.

[50]  James R. Lee,et al.  Improved approximation algorithms for minimum-weight vertex separators , 2005, STOC '05.

[51]  Maurizio Rafanelli,et al.  Querying aggregate data , 1999, PODS '99.

[52]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[53]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[54]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[55]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[56]  Hans-Jörg Schek,et al.  PowerDB-XML: Scalable XML Processing with a Database Cluster , 2003, Intelligent Search on XML Data.

[57]  Werner Nutt,et al.  Equivalences among aggregate queries with negation , 2001, PODS '01.

[58]  Curt Jones,et al.  Finding Good Approximate Vertex and Edge Partitions is NP-Hard , 1992, Inf. Process. Lett..

[59]  Olivia R. Liu Sheng,et al.  Interest-based personalized search , 2007, TOIS.

[60]  Tao Tao,et al.  A formal study of information retrieval heuristics , 2004, SIGIR '04.

[61]  Alejandro López-Ortiz,et al.  Faster Adaptive Set Intersections for Text Searching , 2006, WEA.

[62]  Bamshad Mobasher,et al.  Web search personalization with ontological user profiles , 2007, CIKM '07.

[63]  Avi Arampatzis,et al.  A study of query length , 2008, SIGIR '08.

[64]  Dan Morris,et al.  Investigating the querying and browsing behavior of advanced search engine users , 2007, SIGIR.

[65]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[66]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[67]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[68]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[69]  Kenneth Ward Church,et al.  Heavy-tailed distributions and multi-keyword queries , 2007, SIGIR.

[70]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[71]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[72]  Werner Nutt,et al.  Rewriting aggregate queries using views , 1999, PODS.

[73]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[74]  Werner Nutt,et al.  Deciding equivalences among aggregate queries , 1998, PODS '98.

[75]  Cong Yu,et al.  Querying structured text in an XML database , 2003, SIGMOD '03.

[76]  Walid G. Aref,et al.  Supporting top-kjoin queries in relational databases , 2004, The VLDB Journal.

[77]  Yi Chen,et al.  Reasoning and identifying relevant matches for XML keyword search , 2008, Proc. VLDB Endow..

[78]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[79]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[80]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[81]  Stéphane Grumbach,et al.  On the content of materialized aggregate views , 2000, PODS '00.

[82]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[83]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[84]  Xuehua Shen,et al.  Context-sensitive information retrieval using implicit feedback , 2005, SIGIR '05.

[85]  Georgia Koutrika,et al.  Personalization of queries in database systems , 2004, Proceedings. 20th International Conference on Data Engineering.

[86]  Gerhard Weikum,et al.  The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking , 2002, EDBT.

[87]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[88]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[89]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[90]  Ralf Rantzau,et al.  Context-sensitive ranking , 2006, SIGMOD Conference.

[91]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[92]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[93]  Berthold Reinwald,et al.  BinRank: Scaling Dynamic Authority-Based Search Using Materialized Subgraphs , 2010, IEEE Transactions on Knowledge and Data Engineering.

[94]  Ellen M. Voorhees,et al.  Overview of the TREC-9 Question Answering Track , 2000, TREC.

[95]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.