Hubness-aware kNN classification of high-dimensional data in presence of label noise

Learning with label noise is an important issue in classification, since it is not always possible to obtain reliable data labels. In this paper we explore and evaluate a new approach to learning with label noise in intrinsically high-dimensional data, based on using neighbor occurrence models for hubness-aware k-nearest neighbor classification. Hubness is an important aspect of the curse of dimensionality that has a negative effect on many types of similarity-based learning methods. As we will show, the emergence of hubs as centers of influence in high-dimensional data affects the learning process in the presence of label noise. We evaluate the potential impact of hub-centered noise by defining a hubness-proportional random label noise model that is shown to induce a significantly higher kNN misclassification rate than the uniform random label noise. Real-world examples are discussed where hubness-correlated noise arises either naturally or as a consequence of an adversarial attack. Our experimental evaluation reveals that hubness-based fuzzy k-nearest neighbor classification and Naive Hubness-Bayesian k-nearest neighbor classification might be suitable for learning under label noise in intrinsically high-dimensional data, as they exhibit robustness to high levels of random label noise and hubness-proportional random label noise. The results demonstrate promising performance across several data domains.

[1]  Taghi M. Khoshgoftaar,et al.  Boosted Noise Filters for Identifying Mislabeled Data , 2005 .

[2]  Fabrice Muhlenbach,et al.  Improving Classification by Removing or Relabeling Mislabeled Instances , 2002, ISMIS.

[3]  Songbo Tan,et al.  Neighbor-weighted K-nearest neighbor for unbalanced text corpus , 2005, Expert Syst. Appl..

[4]  Dunja Mladenic,et al.  A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN , 2011, CIKM '11.

[5]  Taghi M. Khoshgoftaar,et al.  Skewed Class Distributions and Mislabeled Examples , 2007 .

[6]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[7]  Xindong Wu,et al.  Eliminating Class Noise in Large Datasets , 2003, ICML.

[8]  Pang-Ning Tan,et al.  Kernel Based Detection of Mislabeled Training Examples , 2007, SDM.

[9]  Taghi M. Khoshgoftaar,et al.  Skewed Class Distributions and Mislabeled Examples , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[10]  Ashwin Srinivasan,et al.  Distinguishing Exceptions From Noise in Non-Monotonic Learning , 1992 .

[11]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[12]  Jaime G. Carbonell,et al.  Efficiently learning the accuracy of labeling sources for selective sampling , 2009, KDD.

[13]  Alexander Binder,et al.  When brain and behavior disagree: Tackling systematic label noise in EEG data with machine learning , 2014, 2014 International Winter Workshop on Brain-Computer Interface (BCI).

[14]  Dunja Mladenic,et al.  The Role of Hubness in Clustering High-Dimensional Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[15]  Liva Ralaivola,et al.  Learning SVMs from Sloppily Labeled Data , 2009, ICANN.

[16]  Gordon V. Cormack,et al.  Feature engineering for mobile (SMS) spam filtering , 2007, SIGIR.

[17]  Dunja Mladenic,et al.  The Role of Hubness in Clustering High-Dimensional Data , 2014, IEEE Trans. Knowl. Data Eng..

[18]  Gordon V. Cormack,et al.  Spam filtering for short messages , 2007, CIKM '07.

[19]  Benoît Frénay,et al.  Estimating mutual information for feature selection in the presence of label noise , 2014, Comput. Stat. Data Anal..

[20]  Leon N. Cooper,et al.  Improving nearest neighbor rule with a simple adaptive distance measure , 2007, Pattern Recognit. Lett..

[21]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[23]  Markus Schedl,et al.  Using Mutual Proximity to Improve Content-Based Audio Similarity , 2011, ISMIR.

[24]  José R. Dorronsoro,et al.  Boosting Parallel Perceptrons for Label Noise Reduction in Classification Problems , 2005, IWINAC.

[25]  Tony R. Martinez,et al.  An algorithm for correcting mislabeled data , 2001, Intell. Data Anal..

[26]  Arthur Flexer,et al.  Limitations of interactive music recommendation based on audio content , 2010, Audio Mostly Conference.

[27]  Alexander Binder,et al.  Learning and Evaluation in Presence of Non-i.i.d. Label Noise , 2014, AISTATS.

[28]  Mohan Kumar,et al.  Using dynamic time warping for online temporal fusion in multisensor systems , 2008, Inf. Fusion.

[29]  Ata Kabán,et al.  Learning kernel logistic regression in the presence of class label noise , 2014, Pattern Recognition.

[30]  Lars Schmidt-Thieme,et al.  Time-Series Classification Based on Individualised Error Prediction , 2010, 2010 13th IEEE International Conference on Computational Science and Engineering.

[31]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[32]  Dunja Mladenic,et al.  Hub Co-occurrence Modeling for Robust High-Dimensional kNN Classification , 2013, ECML/PKDD.

[33]  Stephen Kwek,et al.  A boosting approach to remove class label noise , 2005, Fifth International Conference on Hybrid Intelligent Systems (HIS'05).

[34]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[35]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[36]  Akebo Yamakami,et al.  Contributions to the study of SMS spam filtering: new collection and results , 2011, DocEng '11.

[37]  Donghai Guan,et al.  Identifying mislabeled training data with the aid of unlabeled data , 2011, Applied Intelligence.

[38]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[39]  Anneleen Van Assche,et al.  Ensemble Methods for Noise Elimination in Classification Problems , 2003, Multiple Classifier Systems.

[40]  Rocco A. Servedio,et al.  Random classification noise defeats all convex potential boosters , 2008, ICML.

[41]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[42]  Dunja Mladenic,et al.  The Role of Hubs in Cross-Lingual Supervised Document Retrieval , 2013, PAKDD.

[43]  Ke Xu,et al.  Enhancing the robustness of scale-free networks , 2009, ArXiv.

[44]  Alexandros Nanopoulos,et al.  How does high dimensionality affect collaborative filtering? , 2009, RecSys '09.

[45]  Rob Fergus,et al.  Learning from Noisy Labels with Deep Neural Networks , 2014, ICLR.

[46]  Taghi M. Khoshgoftaar,et al.  Detecting noisy instances with the rule-based classification model , 2005, Intell. Data Anal..

[47]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[48]  Dimitris N. Metaxas,et al.  Distinguishing mislabeled data from correctly labeled data in classifier design , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[49]  Dunja Mladenic,et al.  Nearest neighbor voting in high dimensional data: Learning from past occurrences , 2012, Comput. Sci. Inf. Syst..

[50]  Alexandros Nanopoulos,et al.  Nearest neighbors in high-dimensional data: the emergence and influence of hubs , 2009, ICML '09.

[51]  Christian Thiel,et al.  Classification on Soft Labels Is Robust against Label Noise , 2008, KES.

[52]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[53]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[54]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[55]  Nenad Tomašev,et al.  Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification , 2014 .

[56]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[57]  Dunja Mladenic,et al.  The influence of hubness on nearest-neighbor methods in object recognition , 2011, 2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing.

[58]  Eibe Frank,et al.  Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.

[59]  Dunja Mladenic,et al.  Class imbalance and the curse of minority hubs , 2013, Knowl. Based Syst..

[60]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[61]  Qinghua Hu,et al.  Dynamic time warping constraint learning for large margin nearest neighbor classification , 2011, Inf. Sci..

[62]  Todd Kulesza,et al.  Structured labeling for facilitating concept evolution in machine learning , 2014, CHI.

[63]  Jean-Julien Aucouturier,et al.  Ten Experiments on the Modeling of Polyphonic Timbre. (Dix Expériences sur la Modélisation du Timbre Polyphonique) , 2006 .

[64]  Mykola Pechenizkiy,et al.  Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[65]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.