A Probabilistic Framework for Vague Queries and Imprecise Information in Databases

A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objects retrieved, the ranking for the actual query as well as the overall retrieval quality of the system can be further improved. For specifying different kinds of conditions in vague queries, the notion of vague predicates is introduced. Based on the underlying probabilistic model, also imprecise or missing attribute values can be treated easily. In addition, the corresponding formulas can be applied in combination with standard predicates (from two-valued logic), thus extending standard database systems for coping with missing or imprecise data.

[1]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[2]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[3]  Hector Garcia-Molina,et al.  A Probalilistic Relational Data Model , 1990, EDBT.

[4]  Chris Buckley,et al.  Probabilistic document indexing from relevance feedback data , 1989, SIGIR '90.

[5]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[6]  Gerhard Knorz,et al.  Automatisches Indexieren als Erkennen abstrakter Objekte , 1983 .

[7]  Stephen E. Robertson,et al.  Probabilistic models of indexing and searching , 1980, SIGIR '80.

[8]  Norbert Fuhr,et al.  Models for retrieval with probabilistic indexing , 1989, Inf. Process. Manag..

[9]  Tomasz Imielinski Query processing in deductive databases with incomplete information , 1986, SIGMOD '86.

[10]  Yiyu Yao,et al.  A probability distribution model for information retrieval , 1989, Inf. Process. Manag..

[11]  Norbert Fuhr,et al.  The automatic indexing system AIR/PHYS - from research to applications , 1988, SIGIR '88.

[12]  C. J. van Rijsbergen,et al.  A formal treatment of missing & imprecise information , 1987, SIGIR '87.

[13]  Edward A. Fox,et al.  Research Contributions , 2014 .

[14]  M. Lacroix,et al.  Preferences; Putting More Knowledge into Queries , 1987, VLDB.

[15]  Norbert Fuhr,et al.  Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.

[16]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[17]  Norbert Fuhr,et al.  Probabilistic search term weighting - some negative results , 1987, SIGIR '87.

[18]  J. D. H. Freeman Applied categorical data analysis , 1987 .

[19]  Ellen M. Voorhees,et al.  Automatic assignment of soft Boolean operators , 1985, SIGIR '85.

[20]  Hans-Peter Kriegel,et al.  Geometry-based similarity retrieval of rotational parts , 1989, Proceedings. Second International Conference on Data and Knowledge Systems for Manufacturing and Engineering.

[21]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[22]  Clement T. Yu,et al.  Precision Weighting—An Effective Automatic Indexing Method , 1976, J. ACM.

[23]  Yannis Vassiliou,et al.  Null values in data base management a denotational semantics approach , 1979, SIGMOD '79.

[24]  N. S. Barnett,et al.  Private communication , 1969 .

[25]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[26]  G. Salton,et al.  A Generalized Term Dependence Model in Information Retrieval , 1983 .

[27]  W. Bruce Croft Boolean queries and term dependencies in probabilistic retrieval models , 1986, J. Am. Soc. Inf. Sci..

[28]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[29]  E. F. Codd,et al.  Missing information (applicable and inapplicable) in relational databases , 1986, SGMD.

[30]  B. Buckles,et al.  A fuzzy representation of data for relational databases , 1982 .

[31]  Dennis Tsichritzis,et al.  Advances in Database Technology — EDBT '90 , 1990, Lecture Notes in Computer Science.

[32]  Andrew K. C. Wong,et al.  Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Stephen E. Fienberg,et al.  The analysis of cross-classified categorical data , 1980 .

[34]  W. Bruce Croft Document representation in probabilistic models of information retrieval , 1981, J. Am. Soc. Inf. Sci..

[35]  Vijay V. Raghavan,et al.  On modeling of information retrieval concepts in vector spaces , 1987, TODS.

[36]  Abraham Bookstein,et al.  Outline of a General Probabilistic Retrieval Model , 1983, J. Documentation.

[37]  Raymond Reiter,et al.  Towards a Logical Reconstruction of Relational Database Theory , 1982, On Conceptual Modelling.

[38]  Witold Lipski,et al.  On semantic issues connected with incomplete information databases , 1979, ACM Trans. Database Syst..

[39]  Henri Prade,et al.  Generalizing Database Relational Algebra for the Treatment of Incomplete/Uncertain Information and Vague Queries , 1984, Inf. Sci..