k-Nearest neighbor searching in hybrid spaces

Little work has been reported in the literature to support k-nearest neighbor (k-NN) searches/queries in hybrid data spaces (HDS). An HDS is composed of a combination of continuous and non-ordered discrete dimensions. This combination presents new challenges in data organization and search ordering. In this paper, we present an algorithm for k-NN searches using a multidimensional index structure in hybrid data spaces. We examine the concept of search stages and use the properties of an HDS to derive a new search heuristic that greatly reduces the number of disk accesses in the initial stage of searching. Further, we present a performance model for our algorithm that estimates the cost of performing such searches. Our experimental results demonstrate the effectiveness of our algorithm and the accuracy of our performance estimation model. HighlightsDeveloped algorithm for searching multi-dimensional hybrid data spaces.Introduced the method of improving search performance by examining search stages.Suggested new search heuristic to improve initial stage of searching by 33%.Derived theoretical model accurately predicts the performance of algorithm.

[1]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[2]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[3]  Haym Hirsh,et al.  Converting numerical classification into text classification , 2003, Artif. Intell..

[4]  Sakti Pramanik,et al.  On k-Nearest Neighbor Searching in Non-Ordered Discrete Data Spaces , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[6]  Diego Reforgiato Recupero,et al.  Antipole tree indexing to support range search and k-nearest neighbor search in metric spaces , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Sakti Pramanik,et al.  Dynamic indexing for multidimensional non-ordered discrete data spaces using a data-partitioning approach , 2006, TODS.

[8]  Sakti Pramanik,et al.  Efficient k-nearest neighbor searching in nonordered discrete data spaces , 2010, TOIS.

[9]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[10]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[11]  Julius T. Tou,et al.  Information Systems , 1973, GI Jahrestagung.

[12]  Christos Faloutsos,et al.  Fast Indexing and Visualization of Metric Data Sets using Slim-Trees , 2002, IEEE Trans. Knowl. Data Eng..

[13]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[14]  Roi Blanco,et al.  Probabilistic static pruning of inverted files , 2010, TOIS.

[15]  Changqing Chen,et al.  The C-ND tree: a multidimensional index for hybrid continuous and non-ordered discrete data spaces , 2009, EDBT '09.

[16]  Gonzalo Navarro,et al.  Dynamic spatial approximation trees , 2001, SCCC 2001. 21st International Conference of the Chilean Computer Science Society.

[17]  Alex A. Freitas,et al.  A survey of evolutionary algorithms for data mining and knowledge discovery , 2003 .

[18]  Sakti Pramanik,et al.  A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces , 2006, TOIS.