论文信息 - Scalable and Adaptive KNN for Regression Over Graphs

Scalable and Adaptive KNN for Regression Over Graphs

The family of $k$-nearest neighbor ($k$ NN) schemes comprises simple and effective algorithms that can be used for general machine learning tasks such as classification and regression. The goal of linear $k$ NN regression is to predict the value of an unseen datum as a linear combination of its $k$ neighbors on a graph, which could be found via certain distance metric. Despite the simplicity and effectiveness of $k\mathrm{NN}$ regression, a general problem facing this approach is the proper choice of $k$. Most of the conventional $k$ NN algorithms simply apply the same $k$ for all data samples, meaning that the data samples are assumed to lie on a regular graph wherein each data sample is connected with $k$ neighbors. However, a constant choice may not be optimal for data lying in a heterogeneous feature space. On the other hand, existing algorithms for adaptively choosing $k$ usually incur high computational complexity, especially for large training datasets. In order to cope with this challenge, this paper introduces a novel algorithm which can adaptively choose $k$ for each data sample; meanwhile it is capable of greatly reducing the training complexity by actively choosing training samples. Real data tests corroborate the efficiency and effectiveness of the novel algorithm.

Georgios B. Giannakis | Yanning Shen | Seth Barrash

[1] K. Hamidieh. A data-driven statistical model for predicting the critical temperature of a superconductor , 2018, Computational Materials Science.

[2] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3] Xiaofeng Zhu,et al. Efficient kNN Classification With Different Numbers of Nearest Neighbors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[4] Devavrat Shah,et al. Explaining the Success of Nearest Neighbor Methods in Prediction , 2018, Found. Trends Mach. Learn..

[5] Shichao Zhang,et al. "Missing is useful": missing values in cost-sensitive decision trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6] John Riedl,et al. Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[7] Luis M. Candanedo,et al. Data driven prediction models of energy use of appliances in a low-energy house , 2017 .

[8] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9] Vipin Kumar,et al. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[10] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[11] Manfred Opper,et al. Sparse Representation for Gaussian Process Models , 2000, NIPS.

[12] Sankha Subhra Mullick,et al. Adaptive Learning-Based $k$ -Nearest Neighbor Classifiers With Resilience to Class Imbalance , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[13] A. Ghosh. On Nearest Neighbor Classification Using Adaptive Choice of k , 2007 .