Enhancing data analysis: uncertainty-resistance method for handling incomplete data

In data analysis, incomplete data commonly occurs and can have significant effects on the conclusions that can be drawn from the data. Incomplete data cause another problem, so-called uncertainty which leads to producing unreliable results. Hence, developing effective techniques to impute these missing values is crucial. Missing or incomplete data and noise are two common sources of uncertainty. In this paper, an effective method for imputing missing values is introduced which is robust to uncertainties that are arising from incompleteness and noise. A kernel-based method for removing the noise is designed. Using the belief function theory, the class of incomplete data is determined. Finally, every missing dimension is imputed considering the mean value of the same dimension of the members belonging to the determined class. The performance has been evaluated on real-world data sets from UCI repository. The results of the experiments have been compared with state-of-the-art methods, which show the superiority of the proposed method regarding classification accuracy.

[1]  Sankaran Mahadevan,et al.  Parameter estimation based on interval-valued belief structures , 2014, Eur. J. Oper. Res..

[2]  Chunyu Wang,et al.  Imputation in nonparametric quantile regression with complex data , 2017 .

[3]  Esther-Lydia Silva-Ramírez,et al.  Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns , 2015, Appl. Soft Comput..

[4]  James Miller,et al.  A comparative study of the performance of local feature-based pattern recognition algorithms , 2017, Pattern Analysis and Applications.

[5]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[6]  Thierry Denoeux,et al.  ECM: An evidential version of the fuzzy c , 2008, Pattern Recognit..

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Quan Pan,et al.  A New Incomplete Pattern Classification Method Based on Evidential Reasoning , 2015, IEEE Transactions on Cybernetics.

[9]  Jitender S. Deogun,et al.  Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method , 2004, Rough Sets and Current Trends in Computing.

[10]  Shichao Zhang,et al.  Parimputation: From Imputation and Null-Imputation to Partially Imputation , 2008, IEEE Intell. Informatics Bull..

[11]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[12]  Witold Pedrycz,et al.  An efficient accelerator for attribute reduction from incomplete data in rough set framework , 2011, Pattern Recognit..

[13]  Negin Daneshpour,et al.  Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model , 2019, Expert Syst. Appl..

[14]  Thierry Denoeux,et al.  EVCLUS: evidential clustering of proximity data , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Philippe Smets,et al.  The Combination of Evidence in the Transferable Belief Model , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[17]  Quan Pan,et al.  Median evidential c-means algorithm and its application to community detection , 2015, Knowl. Based Syst..

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  Esther-Lydia Silva-Ramírez,et al.  Missing value imputation on missing completely at random data using multilayer perceptrons , 2011, Neural Networks.

[20]  Quan Pan,et al.  Credal classification rule for uncertain data based on belief functions , 2014, Pattern Recognit..

[21]  Allan Donner,et al.  The Relative Effectiveness of Procedures Commonly Used in Multiple Regression Analysis for Dealing with Missing Values , 1982 .

[22]  Panos Liatsis,et al.  A robust missing value imputation method for noisy data , 2010, Applied Intelligence.

[23]  Florentin Smarandache,et al.  Advances and Applications of DSmT for Information Fusion (Collected Works) , 2004 .

[24]  Bing Yu,et al.  Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering , 2013, Applied Intelligence.

[25]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[26]  Quan Pan,et al.  Adaptive imputation of missing values for incomplete pattern classification , 2016, Pattern Recognit..

[27]  Li Zhang,et al.  A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data , 2014, Pattern Analysis and Applications.

[28]  M. Maeder,et al.  Multivariate linear regression with missing values. , 2013, Analytica chimica acta.

[29]  Wojtek Kowalczyk,et al.  An Incremental Algorithm for Repairing Training Sets with Missing Values , 2016, IPMU.

[30]  Chowdhury Farhan Ahmed,et al.  An effective method for classification with missing values , 2018, Applied Intelligence.

[31]  Sankaran Mahadevan,et al.  A new decision-making method by incomplete preferences based on evidence distance , 2014, Knowl. Based Syst..

[32]  Taghi M. Khoshgoftaar,et al.  Incomplete-Case Nearest Neighbor Imputation in Software Measurement Data , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[33]  Quan Pan,et al.  Classification of incomplete data based on belief functions and K-nearest neighbors , 2015, Knowl. Based Syst..

[34]  Hong Gu,et al.  A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals , 2013, Soft Comput..

[35]  Javad Hamidzadeh,et al.  New Hermite orthogonal polynomial kernel and combined kernels in Support Vector Machine classifier , 2016, Pattern Recognit..

[36]  Mehran Amiri,et al.  Missing data imputation using fuzzy-rough methods , 2016, Neurocomputing.

[37]  Vadlamani Ravi,et al.  Data imputation via evolutionary computation, clustering and a neural network , 2015, Neurocomputing.

[38]  Javad Hamidzadeh,et al.  Belief-based chaotic algorithm for support vector data description , 2019, Soft Comput..

[39]  MinJae Lee,et al.  A multiple imputation method based on weighted quantile regression models for longitudinal censored biomarker data with missing values at early visits , 2018, BMC Medical Research Methodology.

[40]  Jean Dezert,et al.  Credal c-means clustering method based on belief functions , 2015, Knowl. Based Syst..

[41]  Ninni Singh,et al.  Missing Value Imputation with Unsupervised Kohonen Self Organizing Map , 2015 .

[42]  Julie M. David,et al.  Learning disability prediction tool using ANN and ANFIS , 2014, Soft Comput..

[43]  Witold Pedrycz,et al.  Interval kernel Fuzzy C-Means clustering of incomplete data , 2017, Neurocomputing.

[44]  Ke Lu,et al.  Missing data imputation by K nearest neighbours based on grey relational structure and mutual information , 2015, Applied Intelligence.

[45]  Quan Pan,et al.  A new belief-based K-nearest neighbor classification method , 2013, Pattern Recognit..

[46]  Thierry Denoeux,et al.  Maximum Likelihood Estimation from Uncertain Data in the Belief Function Framework , 2013, IEEE Transactions on Knowledge and Data Engineering.

[47]  Chongzhao Han,et al.  Sequential weighted combination for unreliable evidence based on evidence variance , 2013, Decis. Support Syst..

[48]  Ton J. Cleophas,et al.  Missing-data Imputation , 2022 .

[49]  Javad Hamidzadeh,et al.  Identification of uncertainty and decision boundary for SVM classification training using belief function , 2018, Applied Intelligence.

[50]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[51]  Richard Jensen,et al.  Unsupervised fuzzy-rough set-based dimensionality reduction , 2013, Inf. Sci..

[52]  Javad Hamidzadeh,et al.  Improved one-class classification using filled function , 2018, Applied Intelligence.