论文信息 - Ranking-based processing of SQL queries - 字舞流文

Ranking-based processing of SQL queries

A growing number of applications are built on top of search engines and issue complex structured queries. This paper contributes a customisable ranking-based processing of such queries, specifically SQL. Similar to how term-based statistics are exploited by term-based retrieval models, ranking-aware processing of SQL queries exploits tuple-based statistics that are derived from sources or, more precisely, derived from the relations specified in the SQL query. To implement this ranking-based processing, we leverage PSQL, a probabilistic variant of SQL, to facilitate probability estimation and the generalisation of document retrieval models to be used for tuple retrieval. The result is a general-purpose framework that can interpret any SQL query and then assign a probabilistic retrieval model to rank the results of that query. The evaluation on the IMDB and Monster benchmarks proves that the PSQL-based approach is applicable to (semi-)structured and unstructured data and structured queries.

Hany Azzam | Sirvan Yahyaei | Thomas Roelleke | T. Roelleke | H. Azzam | Sirvan Yahyaei

[1] Stephen E. Robertson,et al. Large Test Collection Experiments on an Operational, Interactive System: Okapi at TREC , 1995, Inf. Process. Manag..

[2] Norbert Fuhr,et al. A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[3] Dan Suciu,et al. Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[4] Divesh Srivastava,et al. Benchmarking declarative approximate selection predicates , 2007, SIGMOD '07.

[5] James P. Callan,et al. Structured retrieval for question answering , 2007, SIGIR.

[6] Stephen E. Robertson,et al. Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[7] Gerhard Weikum,et al. An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[8] Sherif Sakr,et al. Relational processing of RDF queries: a survey , 2010, SGMD.

[9] David Hawking,et al. Challenges in Enterprise Search , 2004, ADC.

[10] Rohini K. Srihari,et al. Exploiting syntactic structure of queries in a language modeling approach to IR , 2003, CIKM '03.

[11] James P. Callan. Search engine support for software applications , 2010, CIKM '10.

[12] Kevin Chen-Chuan Chang,et al. RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[13] W. Bruce Croft,et al. A Probabilistic Retrieval Model for Semistructured Data , 2009, ECIR.

[14] Roberto Cornacchia,et al. A Parameterised Search System , 2007, ECIR.

[15] Djoerd Hiemstra,et al. A database approach to information retrieval: The remarkable relationship between language models and region models , 2005, ArXiv.

[16] Hany Azzam,et al. Modelling retrieval models in a probabilistic relational algebra with a new operator: the relational Bayes , 2007, The VLDB Journal.

[17] Christopher Ré,et al. Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization , 2007, VLDB.

[18] Amihai Motro,et al. VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[19] Kenneth Ward Church,et al. Inverse Document Frequency (IDF): A Measure of Deviations from Poisson , 1995, VLC@ACL.

[20] Patrick Bosc,et al. Fuzzy querying with SQL: extensions and implementation aspects , 1988 .

[21] Edgar Meij,et al. Investigating the Semantic Gap through Query Log Analysis , 2009, SEMWEB.

[22] Hany Azzam,et al. A case for probabilistic logic for scalable patent retrieval , 2009, PaIR@CIKM.

[23] Gerhard Weikum,et al. Probabilistic information retrieval approach for ranking of database query results , 2006, TODS.