A Novel Class Noise Detection Method for High-Dimensional Data in Industrial Informatics

The data in industrial informatics may be high-dimensional and mislabeled. Irrelevant or noisy features pose a significant challenge to the detection of high-dimensional mislabeling. The traditional method usually adopts a two-step solution, first finding the relevant subspace and then using it for mislabeling detection. This two-step method struggles to provide the optimal mislabeling detection performance, since it separates the procedures of feature selection and label error detection. To solve this problem, in this article, we integrate the two steps and propose a sequential ensemble noise filter (SENF). In the SENF, relevant features are selected and used to generate a noise score for each instance. Continuously, these noise scores guide feature selection in the regression learning. Thus, the SENF falls in the scope of sequential ensemble learning. We evaluate our approach on several benchmark datasets with high dimensionality and much label noise. It is shown that the SENF is significantly better than other existing label noise detection methods.

[1]  Francisco Herrera,et al.  INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control , 2016, Inf. Fusion.

[2]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Intelligent QoS-aware Traffic Forwarding for SDN/OSPF Hybrid Industrial Internet , 2019 .

[4]  Ling Chen,et al.  Sparse Modeling-Based Sequential Ensemble Learning for Effective Outlier Detection in High-Dimensional Numeric Data , 2018, AAAI.

[5]  Lance Chun Che Fung,et al.  Data Cleaning for Classification Using Misclassification Analysis , 2010, J. Adv. Comput. Intell. Intell. Informatics.

[6]  Yu He,et al.  Fault-Tolerant Event Region Detection on Trajectory Pattern Extraction for Industrial Wireless Sensor Networks , 2020, IEEE Transactions on Industrial Informatics.

[7]  Taghi M. Khoshgoftaar,et al.  Improving Software Quality Prediction by Noise Filtering Techniques , 2007, Journal of Computer Science and Technology.

[8]  Taghi M. Khoshgoftaar,et al.  Identifying learners robust to low quality data , 2008, 2008 IEEE International Conference on Information Reuse and Integration.

[9]  Verónica Bolón-Canedo,et al.  Ensembles for feature selection: A review and future trends , 2019, Inf. Fusion.

[10]  Mykola Pechenizkiy,et al.  Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[11]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[12]  M. Shamim Hossain,et al.  Enforcing Position-Based Confidentiality With Machine Learning Paradigm Through Mobile Edge Computing in Real-Time Industrial Informatics , 2019, IEEE Transactions on Industrial Informatics.

[13]  Choh-Man Teng,et al.  Dealing with Data Corruption in Remote Sensing , 2005, IDA.

[14]  Filiberto Pla,et al.  Prototype selection for the nearest neighbour rule through proximity graphs , 1997, Pattern Recognit. Lett..

[15]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[16]  Pietro Perona,et al.  Pruning training sets for learning of object categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Choh-Man Teng,et al.  Polishing Blemishes: Issues in Data Correction , 2004, IEEE Intell. Syst..

[18]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[19]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[20]  Choh-Man Teng,et al.  A Comparison of Noise Handling Techniques , 2001, FLAIRS.

[21]  Naveen K. Chilamkurti,et al.  IoT Resource Allocation and Optimization Based on Heuristic Algorithm , 2020, Sensors.

[22]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Use of Classification Algorithms in Noise Detection and Elimination , 2009, HAIS.

[23]  Alessandro Panconesi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[24]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[25]  Yanchun Zhang,et al.  Support Vector Machine for Outlier Detection in Breast Cancer Survivability Prediction , 2008, APWeb Workshops.

[26]  Choh-Man Teng Evaluating Noise Correction , 2000, PRICAI.

[27]  Taghi M. Khoshgoftaar,et al.  Generating multiple noise elimination filters with the ensemble-partitioning filter , 2004, Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, 2004. IRI 2004..