Probabilistic information retrieval approach for ranking of database query results

We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from information retrieval for structured data. Our solution is domain independent and leverages data and workload statistics and correlations. We evaluate the quality of our approach with a user survey on a real database. Furthermore, we present and experimentally evaluate algorithms to efficiently retrieve the top ranked results, which demonstrate the feasibility of our ranking system.

[1]  C. J. van Rijsbergen,et al.  An Evaluation of feedback in Document Retrieval using Co‐Occurrence Data , 1978, J. Documentation.

[2]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[3]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[4]  Norbert Fuhr,et al.  A Probabilistic Framework for Vague Queries and Imprecise Information in Databases , 1990, VLDB.

[5]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[6]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[7]  Norbert Fuhr,et al.  A probabilistic relational model for the integration of IR and databases , 1993, SIGIR.

[8]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[9]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[10]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[11]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[12]  Thomas S. Huang,et al.  Content-based image retrieval with relevance feedback in MARS , 1997, Proceedings of International Conference on Image Processing.

[13]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[14]  Clement T. Yu,et al.  Priniples of Database Query Processing for Advanced Applications , 1997 .

[15]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[16]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[17]  Norbert Fuhr,et al.  HySpirit - A Probabilistic Inference Engine for Hypermedia Retrieval in Large Databases , 1998, EDBT.

[18]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[19]  Ronald Fagin,et al.  Fuzzy queries in multimedia database systems , 1998, PODS '98.

[20]  William W. Cohen Providing database-like access to the Web using queries based on textual similarity , 1998, SIGMOD '98.

[21]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[22]  Laura M. Haas,et al.  Using Fagin's algorithm for merging ranked results in multimedia middleware , 1999, Proceedings Fourth IFCIS International Conference on Cooperative Information Systems. CoopIS 99 (Cat. No.PR00384).

[23]  Werner Kießling,et al.  Optimizing Multi-Feature Queries for Image Databases , 2000, VLDB.

[24]  Christos Faloutsos,et al.  FALCON: Feedback Adaptive Loop for Content-Based Retrieval , 2000, VLDB.

[25]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[26]  Sharad Mehrotra,et al.  Efficient Query Refinement in Multimedia Databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[27]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[28]  M. Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[29]  Eric Bloedorn,et al.  Experiences in mining aviation safety data , 2001, SIGMOD '01.

[30]  Luis Gravano,et al.  Top-k selection queries over relational databases: Mapping strategies and performance evaluation , 2002, TODS.

[31]  Sharad Mehrotra,et al.  An Approach to Integrating Query Refinement in SQL , 2002, EDBT.

[32]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[33]  Taurai Tapiwa Chinenyanga,et al.  An expressive and efficient language for XML information retrieval , 2002, J. Assoc. Inf. Sci. Technol..

[34]  Werner Kießling,et al.  Foundations of Preferences in Database Systems , 2002, VLDB.

[35]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[36]  Surajit Chaudhuri,et al.  DBXplorer: enabling keyword search over relational databases , 2002, SIGMOD '02.

[37]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[38]  Gerhard Weikum,et al.  The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking , 2002, EDBT.

[39]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.

[40]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[41]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[42]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[43]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[44]  W. Bruce Croft,et al.  Language Modeling for Information Retrieval , 2010, The Springer International Series on Information Retrieval.

[45]  Hugo Zaragoza,et al.  Information Retrieval: Algorithms and Heuristics , 2002, Information Retrieval.

[46]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval) , 2004 .

[47]  Gerhard Weikum,et al.  Probabilistic Ranking of Database Query Results , 2004, VLDB.

[48]  Mounia Lalmas,et al.  Modelling Vague Content and Structure Querying in XML Retrieval with a Probabilistic Object-Relational Framework , 2004, FQAS.

[49]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[50]  Norbert Fuhr,et al.  XIRQL: An XML query language based on information retrieval concepts , 2004, TOIS.

[51]  S. Murray Structure and Content , 2004 .

[52]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[53]  Sihem Amer-Yahia,et al.  Report on the DB/IR panel at SIGMOD 2005 , 2005, SGMD.

[54]  Sihem Amer-Yahia,et al.  Structure and Content Scoring for XML , 2005, VLDB.

[55]  Gerhard Weikum,et al.  An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[56]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[57]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[58]  Xuehua Shen,et al.  Context-sensitive information retrieval using implicit feedback , 2005, SIGIR '05.

[59]  Dan Suciu,et al.  Answering Queries from Statistics and Probabilistic Views , 2005, VLDB.