Approximate Query Processing with Summary Tables in Statistical Databases

Statistical Databases usually allow only statistical queries. In order to answer a query some kind of summarization must be performed on the raw data. If the size of the original data is too large, e.g. as in Census data and the Current Population Survey, obtaining accurate answers is extremely time consuming. Thus, if the application allows for some precision loss in the answer, the mechanism for query answering could take advantage of previously computed summaries to answer other summary queries. In this paper we describe the necessary notions to maintain a database of previously computed summary information to allow fast query answering of new summary queries with a qualified accuracy and without having to go back to the original data. We use the concept of summary tables, study the potential of sets of summary tables for answering queries, and organize these sets in a lattice structure.