Scalable and Adaptive KNN for Regression Over Graphs

The family of $k$-nearest neighbor ($k$ NN) schemes comprises simple and effective algorithms that can be used for general machine learning tasks such as classification and regression. The goal of linear $k$ NN regression is to predict the value of an unseen datum as a linear combination of its $k$ neighbors on a graph, which could be found via certain distance metric. Despite the simplicity and effectiveness of $k\mathrm{NN}$ regression, a general problem facing this approach is the proper choice of $k$. Most of the conventional $k$ NN algorithms simply apply the same $k$ for all data samples, meaning that the data samples are assumed to lie on a regular graph wherein each data sample is connected with $k$ neighbors. However, a constant choice may not be optimal for data lying in a heterogeneous feature space. On the other hand, existing algorithms for adaptively choosing $k$ usually incur high computational complexity, especially for large training datasets. In order to cope with this challenge, this paper introduces a novel algorithm which can adaptively choose $k$ for each data sample; meanwhile it is capable of greatly reducing the training complexity by actively choosing training samples. Real data tests corroborate the efficiency and effectiveness of the novel algorithm.

[1]  K. Hamidieh A data-driven statistical model for predicting the critical temperature of a superconductor , 2018, Computational Materials Science.

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  Xiaofeng Zhu,et al.  Efficient kNN Classification With Different Numbers of Nearest Neighbors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Devavrat Shah,et al.  Explaining the Success of Nearest Neighbor Methods in Prediction , 2018, Found. Trends Mach. Learn..

[5]  Shichao Zhang,et al.  "Missing is useful": missing values in cost-sensitive decision trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[7]  Luis M. Candanedo,et al.  Data driven prediction models of energy use of appliances in a low-energy house , 2017 .

[8]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[10]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[11]  Manfred Opper,et al.  Sparse Representation for Gaussian Process Models , 2000, NIPS.

[12]  Sankha Subhra Mullick,et al.  Adaptive Learning-Based $k$ -Nearest Neighbor Classifiers With Resilience to Class Imbalance , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[13]  A. Ghosh On Nearest Neighbor Classification Using Adaptive Choice of k , 2007 .