Ranking-based processing of SQL queries

A growing number of applications are built on top of search engines and issue complex structured queries. This paper contributes a customisable ranking-based processing of such queries, specifically SQL. Similar to how term-based statistics are exploited by term-based retrieval models, ranking-aware processing of SQL queries exploits tuple-based statistics that are derived from sources or, more precisely, derived from the relations specified in the SQL query. To implement this ranking-based processing, we leverage PSQL, a probabilistic variant of SQL, to facilitate probability estimation and the generalisation of document retrieval models to be used for tuple retrieval. The result is a general-purpose framework that can interpret any SQL query and then assign a probabilistic retrieval model to rank the results of that query. The evaluation on the IMDB and Monster benchmarks proves that the PSQL-based approach is applicable to (semi-)structured and unstructured data and structured queries.

[1]  Stephen E. Robertson,et al.  Large Test Collection Experiments on an Operational, Interactive System: Okapi at TREC , 1995, Inf. Process. Manag..

[2]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[3]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[4]  Divesh Srivastava,et al.  Benchmarking declarative approximate selection predicates , 2007, SIGMOD '07.

[5]  James P. Callan,et al.  Structured retrieval for question answering , 2007, SIGIR.

[6]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[7]  Gerhard Weikum,et al.  An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[8]  Sherif Sakr,et al.  Relational processing of RDF queries: a survey , 2010, SGMD.

[9]  David Hawking,et al.  Challenges in Enterprise Search , 2004, ADC.

[10]  Rohini K. Srihari,et al.  Exploiting syntactic structure of queries in a language modeling approach to IR , 2003, CIKM '03.

[11]  James P. Callan Search engine support for software applications , 2010, CIKM '10.

[12]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[13]  W. Bruce Croft,et al.  A Probabilistic Retrieval Model for Semistructured Data , 2009, ECIR.

[14]  Roberto Cornacchia,et al.  A Parameterised Search System , 2007, ECIR.

[15]  Djoerd Hiemstra,et al.  A database approach to information retrieval: The remarkable relationship between language models and region models , 2005, ArXiv.

[16]  Hany Azzam,et al.  Modelling retrieval models in a probabilistic relational algebra with a new operator: the relational Bayes , 2007, The VLDB Journal.

[17]  Christopher Ré,et al.  Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization , 2007, VLDB.

[18]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[19]  Kenneth Ward Church,et al.  Inverse Document Frequency (IDF): A Measure of Deviations from Poisson , 1995, VLC@ACL.

[20]  Patrick Bosc,et al.  Fuzzy querying with SQL: extensions and implementation aspects , 1988 .

[21]  Edgar Meij,et al.  Investigating the Semantic Gap through Query Log Analysis , 2009, SEMWEB.

[22]  Hany Azzam,et al.  A case for probabilistic logic for scalable patent retrieval , 2009, PaIR@CIKM.

[23]  Gerhard Weikum,et al.  Probabilistic information retrieval approach for ranking of database query results , 2006, TODS.