Comparison of Brute-Force and K-D Tree Algorithm

Data mining may be viewed as the extraction of the hidden predictive information from large databases, is a powerful new technology with great potential to analyze important information in the data warehouse. Nearest neighbor search (NNS), also known as proximity search, similarity search or closest point search, is an optimization problem for finding closest points in metric spaces. Brute-force search is a very general problem-solving technique that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem's statement. A Brute-force algorithm for string matching problem has two inputs to be considered: pattern and text. A k-d tree, or k-dimensional tree, is a data structure used for organizing some number of points in a space with k dimensions. K-d trees are very useful for range and nearest neighbour searches. In this paper, we studied and compared k-d tree algorithm and brute force algorithm on various levels. The use of the approximate k-nearest neighbour with K-d Tree data structure and comparing its performance attributes to the brute- force approach. In approximate nearest neighbour to evaluate and compare the efficiency of the data structure when applied on a particular number of points, distance and execution time. The work performed between two techniques and select the best one. The result of the work performed in this paper revealed better performance using the k-d tree, compared to the brute-force approach. The aim of the algorithm is to make faster, more accurate and efficient data structure primarily depends on a particular data set. It can be further expanded as by changing the k-d tree traversal technique. We have proposed a new modified traversal technique for k-d tree.

[1]  Christian Böhm,et al.  Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data , 2001, SIGMOD '01.

[2]  Lipsa Sadath,et al.  Data Mining: A Comparative Study on Various Techniques and Methods , 2013 .

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[4]  Behrouz A. Forouzan,et al.  Data Structures: A Pseudocode Approach with C , 2004 .

[5]  W.R. Mark,et al.  Fast kd-tree Construction with an Adaptive Error-Bounded Heuristic , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[6]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[7]  Rina Panigrahy,et al.  Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.

[8]  Steven Skiena,et al.  The Algorithm Design Manual , 2020, Texts in Computer Science.

[9]  Divya Christopher,et al.  A Study on Selective Data Mining Algorithms , 2011 .

[10]  SametHanan,et al.  A fast all nearest neighbor algorithm for applications involving large point-clouds , 2007 .

[11]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[12]  Derick Wood,et al.  On Binary Trees , 1989, IFIP Congress.

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  David M. Mount,et al.  Analysis of approximate nearest neighbor searching with clustered point sets , 1999, Data Structures, Near Neighbor Searches, and Methodology.

[15]  Larry Andrews,et al.  A template for the nearest neighbor problem , 2001 .

[16]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[17]  Trevor Darrell,et al.  New Algorithms for Efficient High-Dimensional Nonparametric Classification , 2006 .

[18]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[19]  Vandana,et al.  Survey of Nearest Neighbor Techniques , 2010, ArXiv.

[20]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[21]  Qiang Ding,et al.  An efficient weighted nearest neighbour classifier using vertical data representation , 2007, Int. J. Bus. Intell. Data Min..

[22]  Hanan Samet,et al.  A fast all nearest neighbor algorithm for applications involving large point-clouds , 2007, Comput. Graph..

[23]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.