Adaptive kNN using expected accuracy for classification of geo-spatial data

The k-Nearest Neighbor (kNN) classification approach is conceptually simple - yet widely applied since it often performs well in practical applications. However, using a global constant k does not always provide an optimal solution, e. g., for datasets with an irregular density distribution of data points. This paper proposes an adaptive kNN classifier where k is chosen dynamically for each instance (point) to be classified, such that the expected accuracy of classification is maximized. We define the expected accuracy as the accuracy of a set of structurally similar observations. An arbitrary similarity function can be used to find these observations. We introduce and evaluate different similarity functions. For the evaluation, we use five different classification tasks based on geo-spatial data. Each classification task consists of (tens of) thousands of items. We demonstrate, that the presented expected accuracy measures can be a good estimator for kNN performance, and the proposed adaptive kNN classifier outperforms common kNN and previously introduced adaptive kNN algorithms. Also, we show that the range of considered k can be significantly reduced to speed up the algorithm without negative influence on classification accuracy.

[1]  Vincent J. Carey,et al.  Supervised Machine Learning , 2008 .

[2]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[3]  Vittorio Loreto,et al.  Awareness and Learning in Participatory Noise Sensing , 2013, PloS one.

[4]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[5]  Shiliang Sun,et al.  An adaptive k-nearest neighbor algorithm , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[6]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[7]  Yannis Manolopoulos,et al.  Adaptive k-Nearest-Neighbor Classification Using a Dynamic Number of Nearest Neighbors , 2007, ADBIS.

[8]  Leon N. Cooper,et al.  Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence , 2006, Pattern Recognit..

[9]  Leon N. Cooper,et al.  Improving nearest neighbor rule with a simple adaptive distance measure , 2006, Pattern Recognit. Lett..

[10]  Reynold Cheng,et al.  Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[11]  Marco De Nadai,et al.  A multi-source dataset of urban life in the city of Milan and the Province of Trentino , 2015, Scientific Data.

[12]  Sriganesh Madhvanath,et al.  Confidence , 2007, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[13]  Eyke Hüllermeier,et al.  Reliable classification: Learning classifiers that distinguish aleatoric and epistemic uncertainty , 2014, Inf. Sci..

[14]  David J. Ketchen,et al.  THE APPLICATION OF CLUSTER ANALYSIS IN STRATEGIC MANAGEMENT RESEARCH: AN ANALYSIS AND CRITIQUE , 1996 .