Efficient Fuzzy Top-k Query Processing over Uncertain Objects

Recently, many application domains, such as sensor network monitoring and Location-Based Service, raise the issue of uncertain data management. Uncertain objects, a kind of uncertain data, have some uncertain attributes whose values are ranges instead of points. In this paper, we study a new kind of top-k queries, Probabilistic Fuzzy Top-k queries (PF-Topk queries) which can return k results from uncertain objects for fuzzy query conditions. We formally define the problem of PF-Topk query and present a framework for answering this kind of queries. We propose an exact algorithm, Envelope Planes of Membership Function (EPMF) algorithm based on the upper and lower bounding functions, which answers fuzzy top-k queries over uncertain objects in high-dimensional query space efficiently. We also propose an approximate algorithm which improves efficiency while ensuring high precision by setting a proper value of parameter. To reduce the search space, a pruning method is proposed to safely prune some objects before querying. The effectiveness and efficiency of our algorithms is demonstrated by the theoretical analysis and experiments with synthetic and real datasets.

[1]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, IEEE Transactions on Knowledge and Data Engineering.

[2]  Ulrich Bodenhofer,et al.  Flexible Query Answering Using Distance-Based Fuzzy Relations , 2006, Theory and Applications of Relational Structures as Knowledge Instruments.

[3]  Marc Roubens,et al.  Theory and Applications of Relational Structures as Knowledge Instruments II, International Workshops of COST Action 274, TARSKI, 2002-2005, Selected Revised Papers , 2006, Theory and Applications of Relational Structures as Knowledge Instruments.

[4]  Christian Böhm,et al.  The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Shyi-Ming Chen,et al.  Fuzzy query translation for relational database systems , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Yufei Tao,et al.  Range search on multidimensional uncertain data , 2007, TODS.

[7]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[8]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[9]  Jeffrey Xu Yu,et al.  Spatial Range Querying for Gaussian-Based Imprecise Query Objects , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Trans. Knowl. Data Eng..

[11]  Xiang Lian,et al.  Probabilistic ranked queries in uncertain databases , 2008, EDBT '08.

[12]  Ihab F. Ilyas,et al.  Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[13]  Jian Pei,et al.  Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[14]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.