Emerging topics and challenges of learning from noisy data in nonstandard classification: a survey beyond binary class noise

The problem of class noisy instances is omnipresent in different classification problems. However, most of research focuses on noise handling in binary classification problems and adaptations to multiclass learning. This paper aims to contextualize noise labels in the context of non-binary classification problems, including multiclass, multilabel, multitask, multi-instance ordinal and data stream classification. Practical considerations for analyzing noise under these classification problems, as well as trends, open-ended problems and future research directions are analyzed. We believe this paper could help expand research on class noise handling and help practitioners to better identify the particular aspects of noise in challenging classification scenarios.

[1]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[2]  Feng-Li Lian,et al.  Confirming robustness of fuzzy support vector machine via ξ-α bound , 2015, Neurocomputing.

[3]  Qinghua Hu,et al.  Rank Entropy-Based Decision Trees for Monotonic Classification , 2012, IEEE Transactions on Knowledge and Data Engineering.

[4]  Xin Geng,et al.  Logistic Boosting Regression for Label Distribution Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[6]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Saso Dzeroski,et al.  Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois , 1996, ALT.

[8]  Ming Yang,et al.  Mining noisy tagging from multi-label space , 2012, CIKM '12.

[9]  Victor S. Sheng,et al.  Label noise correction and application in crowdsourcing , 2016, Expert Syst. Appl..

[10]  Oliver Stegle,et al.  It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals , 2013, NIPS.

[11]  Jie Zhou,et al.  Transfer estimation of evolving class priors in data stream classification , 2010, Pattern Recognit..

[12]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jianxin Wu,et al.  Deep Label Distribution Learning With Label Ambiguity , 2016, IEEE Transactions on Image Processing.

[15]  Tony R. Martinez,et al.  Using Decision Trees and Soft Labeling to Filter Mislabeled Data , 2008 .

[16]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[17]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[18]  Kurt Hornik,et al.  On the generation of correlated artificial binary data , 1998 .

[19]  Albert Fornells,et al.  A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.

[20]  Giuliano Galimberti,et al.  Classification Trees for Ordinal Responses in R: The rpartScore Package , 2012 .

[21]  José Augusto Baranauskas The number of classes as a source for instability of decision tree algorithms in high dimensional datasets , 2012, Artificial Intelligence Review.

[22]  Julie Josse,et al.  Bootstrap-Based Regularization for Low-Rank Matrix Estimation , 2014, J. Mach. Learn. Res..

[23]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[24]  Pedro Antonio Gutiérrez,et al.  Current prospects on ordinal and monotonic classification , 2016, Progress in Artificial Intelligence.

[25]  Roni Khardon,et al.  Noise Tolerant Variants of the Perceptron Algorithm , 2007, J. Mach. Learn. Res..

[26]  Baoxin Li,et al.  Multiple Class Multiple-Instance Learning and its Application to Image Categorization , 2007, Int. J. Image Graph..

[27]  Ulf Brefeld,et al.  Co-EM support vector learning , 2004, ICML.

[28]  Xindong Wu,et al.  Eliminating Class Noise in Large Datasets , 2003, ICML.

[29]  Beata Beigman Klebanov,et al.  Learning with Annotation Noise , 2009, ACL.

[30]  Nuno Vasconcelos,et al.  Multiple instance learning for soft bags via top instances , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[32]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[33]  Kang Liu,et al.  Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu , 2015, CL.

[34]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[36]  Jaume Amores,et al.  Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..

[37]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[38]  Robert L. Winkler,et al.  Implications of errors in survey data: a Bayesian model , 1992 .

[39]  Liva Ralaivola,et al.  CN = CPCN , 2006, ICML.

[40]  Maoguo Gong,et al.  RBoost: Label Noise-Robust Boosting Algorithm Based on a Nonconvex Loss Function and the Numerically Stable Base Learners , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[42]  Jun Du,et al.  Modelling Class Noise with Symmetric and Asymmetric Distributions , 2015, AAAI.

[43]  Andrés R. Masegosa,et al.  Bagging Decision Trees on Data Sets with Classification Noise , 2010, FoIKS.

[44]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[45]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[46]  Francisco Javier Girón González-Torre,et al.  Misclassified multinomial data: a Bayesian approach , 2007 .

[47]  Rajeev Rastogi,et al.  Data Stream Management: Processing High-Speed Data Streams (Data-Centric Systems and Applications) , 2019 .

[48]  Pedro Antonio Gutiérrez,et al.  Ordinal Regression Methods: Survey and Experimental Study , 2016, IEEE Transactions on Knowledge and Data Engineering.

[49]  Abdelhamid Bouchachia,et al.  Fuzzy classification in dynamic environments , 2011, Soft Comput..

[50]  Francisco Herrera,et al.  Multiple Instance Learning , 2016 .

[51]  J. Friedman Regularized Discriminant Analysis , 1989 .

[52]  Alessandra Alaniz Macedo,et al.  Windowing improvements towards more comprehensible models , 2016, Knowl. Based Syst..

[53]  Paolo Rosso,et al.  Figurative messages and affect in Twitter: Differences between #irony, #sarcasm and #not , 2016, Knowl. Based Syst..

[54]  Sun Online Ensemble Learning of Data Streams with Gradually Evolved Classes , 2016 .

[55]  P. Lachenbruch Note on Initial Misclassification Effects on the Quadratic Discriminant Function , 1979 .

[56]  Claudia Eckert,et al.  Adversarial Label Flips Attack on Support Vector Machines , 2012, ECAI.

[57]  Rajeev Rastogi,et al.  Data Stream Management , 2016, Data-Centric Systems and Applications.

[58]  Miao Xu,et al.  Incomplete Label Distribution Learning , 2017, IJCAI.

[59]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[60]  Clayton Scott,et al.  A Rate of Convergence for Mixture Proportion Estimation, with Application to Learning from Noisy Labels , 2015, AISTATS.

[61]  Bernard De Baets,et al.  Optimal monotone relabelling of partially non-monotone ordinal data , 2012, Optim. Methods Softw..

[62]  Ernestina Menasalvas Ruiz,et al.  Mining Recurring Concepts in a Dynamic Feature Space , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[63]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[64]  Szymon Wilk,et al.  Learning from Imbalanced Data in Presence of Noisy and Borderline Examples , 2010, RSCTC.

[65]  Nitesh V. Chawla,et al.  Classifier Evaluation with Missing Negative Class Labels , 2013, IDA.

[66]  Wensheng Zhang,et al.  A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer , 2006, Bioinform..

[67]  Qiang Yang,et al.  Transfer Knowledge between Cities , 2016, KDD.

[68]  Leon Sterling,et al.  Adding monotonicity to learning algorithms may impair their accuracy , 2009, Expert Syst. Appl..

[69]  H. Daniels,et al.  Derivation of monotone decision models from noisy data , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[70]  Stephen Shaoyi Liao,et al.  Mining comparative opinions from customer reviews for Competitive Intelligence , 2011, Decis. Support Syst..

[71]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  MINAS: multiclass learning algorithm for novelty detection in data streams , 2016, Data Mining and Knowledge Discovery.

[72]  Iñaki Inza,et al.  Weak supervision and other non-standard classification problems: A taxonomy , 2016, Pattern Recognit. Lett..

[73]  Sandra Zilles,et al.  PAC-Learning with General Class Noise Models , 2012, KI.

[74]  Mohamed Medhat Gaber,et al.  Data stream mining in ubiquitous environments: state‐of‐the‐art and current directions , 2014, WIREs Data Mining Knowl. Discov..

[75]  Xindong Wu,et al.  Bridging Local and Global Data Cleansing: Identifying Class Noise in Large, Distributed Data Datasets , 2006, Data Mining and Knowledge Discovery.

[76]  Nir Shavit,et al.  Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.

[77]  R. Tripathi,et al.  The Effect of Errors in Diagnosis and Measurement on the Estimation of the Probability of an Event , 1980 .

[78]  Junbin Gao,et al.  Learning graph structure for multi-label image classification via clique generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Yang Zhang,et al.  Mining Multi-label Concept-Drifting Data Streams Using Dynamic Classifier Ensemble , 2009, ACML.

[80]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[81]  Xindong Wu,et al.  An Empirical Study of the Noise Impact on Cost-Sensitive Learning , 2007, IJCAI.

[82]  Francisco Herrera,et al.  Analyzing the presence of noise in multi-class problems: alleviating its influence with the One-vs-One decomposition , 2012, Knowledge and Information Systems.

[83]  Kuo-Jung Lee,et al.  milr: Multiple-Instance Logistic Regression with Lasso Penalty , 2017, R J..

[84]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection of Data Streams in a Dynamic Feature Space , 2010, ECML/PKDD.

[85]  Zhi-Hua Zhou,et al.  Multi-instance multi-label learning , 2008, Artif. Intell..

[86]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[87]  Chih-Chuan Chen,et al.  A Regularized Monotonic Fuzzy Support Vector Machine Model for Data Mining With Prior Knowledge , 2015, IEEE Transactions on Fuzzy Systems.

[88]  Naresh Manwani,et al.  Noise Tolerance Under Risk Minimization , 2011, IEEE Transactions on Cybernetics.

[89]  Nada Lavrac,et al.  Experiments with Noise Filtering in a Medical Domain , 1999, ICML.

[90]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[91]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Adapting Noise Filters for Ranking , 2015, 2015 Brazilian Conference on Intelligent Systems (BRACIS).

[92]  Yong Wang,et al.  Online active learning of decision trees with evidential data , 2016, Pattern Recognit..

[93]  Jiayu Zhou,et al.  Modeling disease progression via multi-task learning , 2013, NeuroImage.

[94]  Trevor Darrell,et al.  Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[95]  Francisco Herrera,et al.  CNC-NOS: Class noise cleaning by ensemble filtering and noise scoring , 2018, Knowl. Based Syst..

[96]  Jiawei Han,et al.  On Appropriate Assumptions to Mine Data Streams: Analysis and Practice , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[97]  Gustavo E. A. P. A. Batista,et al.  Class imbalance revisited: a new experimental setup to assess the performance of treatment methods , 2014, Knowledge and Information Systems.

[98]  Francisco Charte,et al.  Multilabel Classification: Problem Analysis, Metrics and Techniques , 2016 .

[99]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[100]  Charu C. Aggarwal,et al.  Classification and Adaptive Novel Class Detection of Feature-Evolving Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[101]  Yann Chevaleyre Noise-Tolerant Rule induction from Multi-Instance data , 2001 .

[102]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[103]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[104]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[105]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[106]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[107]  Robert P. W. Duin,et al.  Multiple-instance learning as a classifier combining problem , 2013, Pattern Recognit..

[108]  Stefanie Nowak,et al.  How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.

[109]  Xindong Wu,et al.  Robust ensemble learning for mining noisy data streams , 2011, Decis. Support Syst..

[110]  Yang Song,et al.  Handling label noise in video classification via multiple instance learning , 2011, 2011 International Conference on Computer Vision.

[111]  Nobuhiro Yugami,et al.  Effects of domain characteristics on instance-based learning algorithms , 2003, Theor. Comput. Sci..

[112]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Use of Classification Algorithms in Noise Detection and Elimination , 2009, HAIS.

[113]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[114]  Eyke Hüllermeier,et al.  Dependent binary relevance models for multi-label classification , 2014, Pattern Recognit..

[115]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[116]  Hsuan-Tien Lin,et al.  One-sided Support Vector Regression for Multiclass Cost-sensitive Classification , 2010, ICML.

[117]  Bo Sun,et al.  A robust multi-class AdaBoost algorithm for mislabeled noisy data , 2016, Knowl. Based Syst..

[118]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[119]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[120]  Francisco Herrera,et al.  INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control , 2016, Inf. Fusion.

[121]  Nada Lavrac,et al.  Ensemble-based noise detection: noise ranking and visual performance evaluation , 2012, Data Mining and Knowledge Discovery.

[122]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[123]  Maryam Sabzevari,et al.  A two-stage ensemble method for the detection of class-label noise , 2018, Neurocomputing.

[124]  A. J. Feelders Monotone Relabeling in Ordinal Classification , 2010, 2010 IEEE International Conference on Data Mining.

[125]  Ya Zhang,et al.  Multi-task learning for boosting with application to web search ranking , 2010, KDD.

[126]  Zhen Wang,et al.  Learning Low-Rank Label Correlations for Multi-label Classification with Missing Labels , 2014, 2014 IEEE International Conference on Data Mining.

[127]  Ling Li,et al.  Reduction from Cost-Sensitive Ordinal Ranking to Weighted Binary Classification , 2012, Neural Computation.

[128]  Chen Zhang,et al.  Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model , 2009, Bioinform..

[129]  Yingtao Bi,et al.  The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise , 2010, J. Multivar. Anal..

[130]  Sebastián Ventura,et al.  A Tutorial on Multilabel Learning , 2015, ACM Comput. Surv..

[131]  Taghi M. Khoshgoftaar,et al.  Knowledge discovery from imbalanced and noisy data , 2009, Data Knowl. Eng..

[132]  Yueting Zhuang,et al.  Data-Dependent Label Distribution Learning for Age Estimation , 2017, IEEE Transactions on Image Processing.

[133]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[134]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[135]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[136]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[137]  Miroslav Kubat Similarities: Nearest-Neighbor Classifiers , 2015 .

[138]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[139]  Francisco Charte,et al.  Addressing imbalance in multilabel classification: Measures and random resampling algorithms , 2015, Neurocomputing.

[140]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[141]  Aritra Ghosh,et al.  Making risk minimization tolerant to label noise , 2014, Neurocomputing.

[142]  Ryszard S. Michalski,et al.  Selecting Examples for Partial Memory Learning , 2000, Machine Learning.

[143]  Filiberto Pla,et al.  Prototype selection for the nearest neighbour rule through proximity graphs , 1997, Pattern Recognit. Lett..

[144]  Marco Loog,et al.  Multiple instance learning with bag dissimilarities , 2013, Pattern Recognit..

[145]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[146]  Fabricio A. Breve,et al.  Particle competition and cooperation for semi-supervised learning with label noise , 2015, Neurocomputing.

[147]  Ke Chen,et al.  Learning with Ambiguous Label Distribution for Apparent Age Estimation , 2016, ACCV.

[148]  Bartosz Krawczyk,et al.  One-class classifiers with incremental learning and forgetting for data streams with concept drift , 2015, Soft Comput..

[149]  Arie Ben-David,et al.  Generating noisy monotone ordinal datasets , 2013, Artif. Intell. Res..

[150]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[151]  Xin Geng,et al.  Label Distribution Learning , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[152]  G. McLachlan Asymptotic Results for Discriminant Analysis When the Initial Samples are Misclassified , 1972 .

[153]  Xindong Wu,et al.  Cost-guided class noise handling for effective cost-sensitive learning , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).