Making the threshold algorithm access cost aware

Assume a database storing N objects with d numerical attributes or feature values. All objects in the database can be assigned an overall score that is derived from their single feature values (and the feature values of a user-defined query). The problem considered here is then to efficiently retrieve the k objects with minimum (or maximum) overall score. The well-known threshold algorithm (TA) was proposed as a solution to this problem. TA views the database as a set of d sorted lists storing the feature values. Even though TA is optimal with regard to the number of accesses, its overall access cost can be high since, in practice, some list accesses may be more expensive than others. We therefore propose to make TA access cost aware by choosing the next list to access such that the overall cost is minimized. Our experimental results show that this overall cost is close to the optimal cost and significantly lower than the cost of prior approaches.

[1]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[2]  Chad Carson,et al.  Optimizing queries over multimedia repositories , 1996, SIGMOD '96.

[3]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[4]  Ambuj K. Singh,et al.  Accelerating high-dimensional nearest neighbor queries , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[5]  Ambuj K. Singh,et al.  Modeling high-dimensional index structures using sampling , 2001, SIGMOD '01.

[6]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[7]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[8]  Luis Gravano,et al.  Optimizing top-k selection queries over multimedia repositories , 2004, IEEE Transactions on Knowledge and Data Engineering.

[9]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[10]  Stephen Blott,et al.  An Approximation- Based Data Structure for Similarity Search , 2006 .

[11]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[12]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.