Cost-sensitive KNN classification

Abstract KNN (K Nearest Neighbors) classification is one of top-10 data mining algorithms. It is significant to extend KNN classifiers sensitive to costs for imbalanced data classification applications. This paper designs two efficient cost-sensitive KNN classification models, referred to Direct-CS-KNN classifier and Distance-CS-KNN classifier. The two CS-KNN classifiers are further improved with extant strategies, such as smoothing, minimum-cost k-value selection, feature selection and ensemble selection. We evaluate our methods with real data sets, to show that our CS-KNN classifiers can significantly reduce misclassification cost.

[1]  Thomas G. Dietterich,et al.  Methods for cost-sensitive learning , 2002 .

[2]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[3]  Xiaofeng Zhu,et al.  One-Step Multi-View Spectral Clustering , 2019, IEEE Transactions on Knowledge and Data Engineering.

[4]  Hsiao-Lung Chan,et al.  An intelligent classifier for prognosis of cardiac resynchronization therapy based on speckle-tracking echocardiograms , 2012, Artif. Intell. Medicine.

[5]  Shichao Zhang,et al.  A novel kNN algorithm with data-driven k parameter computation , 2017, Pattern Recognit. Lett..

[6]  Xin Yao,et al.  Cost-sensitive classification with genetic programming , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[8]  Chengqi Zhang,et al.  Cost-Sensitive Classification with k-Nearest Neighbors , 2013, KSEM.

[9]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[11]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[12]  Chengqi Zhang,et al.  Cost Sensitive Classification in Data Mining , 2010, ADMA.

[13]  Xiaofeng Zhu,et al.  Unsupervised feature selection via local structure learning and sparse learning , 2017, Multimedia Tools and Applications.

[14]  Xuelong Li,et al.  Learning Instance Correlation Functions for Multilabel Classification , 2017, IEEE Transactions on Cybernetics.

[15]  Shichao Zhang,et al.  Efficient kNN classification algorithm for big data , 2016, Neurocomputing.

[16]  Shichao Zhang,et al.  kNN Algorithm with Data-Driven k Value , 2014, ADMA.

[17]  Bianca Zadrozny One-Benefit learning: cost-sensitive learning with restricted cost information , 2005, UBDM '05.

[18]  Xuelong Li,et al.  Graph PCA Hashing for Similarity Search , 2017, IEEE Transactions on Multimedia.

[19]  Shichao Zhang,et al.  Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Tao Wang,et al.  Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning , 2010, J. Syst. Softw..

[21]  Xiaofeng Zhu,et al.  Graph self-representation method for unsupervised feature selection , 2017, Neurocomputing.

[22]  Zi Huang,et al.  Dimensionality reduction by Mixed Kernel Canonical Correlation Analysis , 2012, Pattern Recognition.

[23]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[24]  Shichao Zhang,et al.  Incorporating medical history to cost sensitive classification with lazy learning strategy , 2010, 2010 IEEE International Conference on Progress in Informatics and Computing.

[25]  Xiaofeng Zhu,et al.  Unsupervised feature selection by self-paced learning regularization , 2020, Pattern Recognit. Lett..

[26]  Shichao Zhang,et al.  Low-Rank Sparse Subspace for Spectral Clustering , 2019, IEEE Transactions on Knowledge and Data Engineering.

[27]  Shichao Zhang,et al.  Shell-neighbor method and its application in missing data imputation , 2011, Applied Intelligence.

[28]  Xiaofeng Zhu,et al.  Dynamic graph learning for spectral feature selection , 2018, Multimedia Tools and Applications.

[29]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[30]  Xiaofeng Zhu,et al.  Efficient kNN Classification With Different Numbers of Nearest Neighbors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Shichao Zhang,et al.  "Missing is useful": missing values in cost-sensitive decision trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[32]  Xuelong Li,et al.  Learning k for kNN Classification , 2017, ACM Trans. Intell. Syst. Technol..

[33]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[34]  Zi Huang,et al.  Self-taught dimensionality reduction on the high-dimensional small-sized data , 2013, Pattern Recognit..

[35]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[36]  Xiaofeng Zhu,et al.  Local and Global Structure Preservation for Robust Unsupervised Spectral Feature Selection , 2018, IEEE Transactions on Knowledge and Data Engineering.

[37]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[38]  Shichao Zhang,et al.  Cost-Sensitive Test Strategies , 2006, AAAI.

[39]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[40]  Shichao Zhang,et al.  Self-representation nearest neighbor search for classification , 2016, Neurocomputing.

[41]  Shizhao Zhang,et al.  K NN-CF Approach: Incorporating Certainty Factor to k NN Classification. , 2010 .

[42]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[43]  Zi Huang,et al.  A Sparse Embedding and Least Variance Encoding Approach to Hashing , 2014, IEEE Transactions on Image Processing.

[44]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[45]  Li Liu,et al.  Cost-sensitive semi-supervised classification using CS-EM , 2008, 2008 8th IEEE International Conference on Computer and Information Technology.

[46]  Shichao Zhang,et al.  Noisy data elimination using mutual k-nearest neighbor for classification mining , 2012, J. Syst. Softw..