Efficient Multidimensional AkNN Query Processing in the Cloud

A k-nearest neighbor (kNN) query determines the k nearest points, using distance metrics, from a given location. An all k-nearest neighbor (AkNN) query constitutes a variation of a kNN query and retrieves the k nearest points for each point inside a database. Their main usage resonates in spatial databases and they consist the backbone of many location-based applications and not only. In this work, we propose a novel method for classifying multidimensional data using an AkNN algorithm in the MapReduce framework. Our approach exploits space decomposition techniques for processing the classification procedure in a parallel and distributed manner. To our knowledge, we are the first to study the kNN classification of multidimensional objects under this perspective. Through an extensive experimental evaluation we prove that our solution is efficient, robust and scalable in processing the given queries.

[1]  Yufei Tao,et al.  All-nearest-neighbors queries in spatial databases , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[2]  Michael J. Carey,et al.  Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, USA, May 22-25, 1995 , 1995, PODS 1995.

[3]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.

[4]  Yoshiharu Ishikawa,et al.  Processing All k-Nearest Neighbor Queries in Hadoop , 2012, WAIM.

[5]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[6]  Sebastian Michel,et al.  RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce , 2010, LSDS-IR@SIGIR.

[7]  Jeffrey D. Ullman,et al.  Optimizing joins in a map-reduce environment , 2010, EDBT '10.

[8]  Feifei Li,et al.  K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[9]  Christian S. Jensen,et al.  Integrating non-spatial preferences into spatial location queries , 2014, SSDBM '14.

[10]  Chen Li,et al.  Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[11]  Joshua Zhexue Huang,et al.  Minimum Spanning Tree Based Classification Model for Massive Data with MapReduce Implementation , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Fuzhen Zhuang,et al.  Parallel Implementation of Classification Algorithms Based on MapReduce , 2010, RSKT.

[14]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[15]  Marianne Winslett,et al.  Scientific and Statistical Database Management, 21st International Conference, SSDBM 2009, New Orleans, LA, USA, June 2-4, 2009, Proceedings , 2009, SSDBM.

[16]  Panayiotis Bozanis,et al.  A network aware privacy model for online requests in trajectory data , 2009, Data Knowl. Eng..

[17]  Beng Chin Ooi,et al.  Efficient Processing of k Nearest Neighbor Joins using MapReduce , 2012, Proc. VLDB Endow..

[18]  Jianwen Su,et al.  Efficient index-based KNN join processing for high-dimensional data , 2007, Inf. Softw. Technol..

[19]  Hans-Peter Kriegel,et al.  Optimizing All-Nearest-Neighbor Queries with Trigonometric Pruning , 2010, SSDBM.

[20]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[21]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[22]  Christian Böhm,et al.  The k-Nearest Neighbour Join: Turbo Charging the KDD Process , 2004, Knowledge and Information Systems.

[23]  Sihem Amer-Yahia,et al.  Proceedings of the 15th International Conference on Extending Database Technology , 2010, EDBT 2012.

[24]  Jignesh M. Patel,et al.  Efficient Evaluation of All-Nearest-Neighbor Queries , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Beng Chin Ooi,et al.  Gorder: An Efficient Method for KNN Join Processing , 2004, VLDB.

[26]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.