The paper shows an adaptive approach to the query selectivity estimation problem for queries with a range selection condition based on continuous attributes. The selectivity factor estimates a size of data satisfying a query condition. This estimation is calculated at the initial stage of the query processing for choosing the optimal query execution plan. A non-parametric estimator of probability density of attribute values distribution is required for the selectivity calculation. Most of known approaches use equi-width or equi-height histograms as representations of attribute values distributions. The proposed approach uses a new type of histogram based on either an attribute values distribution or a distribution of range bounds of a query selection condition. Applying query-condition-aware histogram lets obtain more accurate selectivity values than using a standard histogram. The approach may be implemented as some extension of query optimizer of DBMS Oracle using ODCI Stats module.
[1]
Luis Gravano,et al.
STHoles: a multidimensional workload-aware histogram
,
2001,
SIGMOD '01.
[2]
Kyuseok Shim,et al.
Approximate query processing using wavelets
,
2001,
The VLDB Journal.
[3]
Ben Taskar,et al.
Selectivity estimation using probabilistic models
,
2001,
SIGMOD '01.
[4]
Wen-Chi Hou,et al.
Selectivity estimation of range queries based on data density approximation via cosine series
,
2007,
Data Knowl. Eng..
[5]
Harald Kosch,et al.
The MPEG-7 Multimedia Database System (MPEG-7 MMDB)
,
2008,
J. Syst. Softw..
[6]
Dariusz Rafal Augustyn.
Applying Advanced Methods of Query Selectivity Estimation in Oracle DBMS
,
2009,
ICMMI.
[7]
Dimitrios Gunopulos,et al.
Selectivity estimators for multidimensional range queries over real attributes
,
2005,
The VLDB Journal.