A new incomplete pattern belief classification method with multiple estimations based on KNN

Abstract The classification of missing data is a challenging task, because the lack of pattern attributes may bring uncertainty to the classification results and most classification methods produce only one estimation, which may have a risk of misclassification. A new incomplete pattern belief classification (PBC) method with multiple estimations based on K -nearest neighbors (KNNs) is proposed to deal with missing data. PBC preliminarily classifies the incomplete pattern using its KNNs obtained by the known attributes. The pattern whose KNNs contain only one class information can be directly divided into this class. If not, the p ( p ≤ c ) estimations will be computed according to the different KNNs in different classes when p classes are included in the KNNs of the pattern and it will yield p pieces of classification results by the chosen classifier. Then, a weighted possibility distance method is used to further divide the p classification results with their KNNs’ classification information. The pattern with similar possibility distances in different classes will be reasonably classified into a proper meta-class under the framework of belief functions theory, which truly reflects the uncertainty of the pattern caused by missing values and effectively reduces the error rate. Experiments on both artificial and real data sets show that PBC is effective for dealing with missing data.

[1]  Daniel J. Mundfrom,et al.  Imputing Missing Values: The Effect on the Accuracy of Classification , 1998 .

[2]  Jean Dezert,et al.  Credal c-means clustering method based on belief functions , 2015, Knowl. Based Syst..

[3]  S. B. Rao,et al.  Evidence theoretic classification of ballistic missiles , 2015, Appl. Soft Comput..

[4]  Kan-Jian Zhang,et al.  Wind power prediction with missing data using Gaussian process regression and multiple imputation , 2018, Appl. Soft Comput..

[5]  Vadlamani Ravi,et al.  Counter propagation auto-associative neural network based data imputation , 2015, Inf. Sci..

[6]  Licheng Jiao,et al.  Adaptive sparse graph learning based dimensionality reduction for classification , 2019, Appl. Soft Comput..

[7]  Lígia P. Brás,et al.  Improving cluster-based missing value estimation of DNA microarray data. , 2007, Biomolecular engineering.

[8]  Loo Chu Kiong,et al.  Autonomous and deterministic supervised fuzzy clustering with data imputation capabilities , 2011 .

[9]  Ashok Kumar Dwivedi Artificial neural network model for effective cancer classification using microarray gene expression data , 2018, Neural Computing and Applications.

[10]  Quan Pan,et al.  A new belief-based K-nearest neighbor classification method , 2013, Pattern Recognit..

[11]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[12]  Nikhil R. Pal,et al.  Imputation of missing data with neural networks for classification , 2019, Knowl. Based Syst..

[13]  Quan Pan,et al.  A new pattern classification improvement method with local quality matrix based on K-NN , 2019, Knowl. Based Syst..

[14]  Qinghua Hu,et al.  Locally Linear Approximation Approach for Incomplete Data , 2018, IEEE Transactions on Cybernetics.

[15]  Quan Pan,et al.  Adaptive imputation of missing values for incomplete pattern classification , 2016, Pattern Recognit..

[16]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[17]  Ching-Hsue Cheng,et al.  A novel purity-based k nearest neighbors imputation method and its application in financial distress prediction , 2019, Eng. Appl. Artif. Intell..

[18]  Chih-Fong Tsai,et al.  Combining instance selection for better missing value imputation , 2016, J. Syst. Softw..

[19]  Hong-yu Zhang,et al.  Intuitionistic fuzzy multi-criteria decision-making method based on evidential reasoning , 2013, Appl. Soft Comput..

[20]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[21]  Hao Ye,et al.  A data imputation method for multivariate time series based on generative adversarial network , 2019, Neurocomputing.

[22]  Philippe Smets,et al.  Analyzing the combination of conflicting belief functions , 2007, Inf. Fusion.

[23]  Esther-Lydia Silva-Ramírez,et al.  Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns , 2015, Appl. Soft Comput..

[24]  Xiaofei Ma,et al.  Missing value imputation method for disaster decision-making using K nearest neighbor , 2015 .

[25]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[26]  Quan Pan,et al.  A New Incomplete Pattern Classification Method Based on Evidential Reasoning , 2015, IEEE Transactions on Cybernetics.

[27]  Amaury Lendasse,et al.  Extreme learning machine for missing data using multiple imputations , 2016, Neurocomputing.

[28]  Mengjie Zhang,et al.  Improving performance of classification on incomplete data using feature selection and clustering , 2018, Appl. Soft Comput..

[29]  Johan A. K. Suykens,et al.  Handling missing values in support vector machine classifiers , 2005, Neural Networks.

[30]  Johan A. K. Suykens,et al.  Solution Path for Pin-SVM Classifiers With Positive and Negative $\tau $ Values , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Senén Barro,et al.  Polynomial Kernel Discriminant Analysis for 2D visualization of classification problems , 2019, Neural Computing and Applications.

[32]  Juan Ramón Rico-Juan,et al.  Improving kNN multi-label classification in Prototype Selection scenarios using class proposals , 2015, Pattern Recognit..

[33]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[34]  Yuxing Peng,et al.  A subspace ensemble framework for classification with high dimensional missing data , 2016, Multidimensional Systems and Signal Processing.

[35]  Mihaela van der Schaar,et al.  GAIN: Missing Data Imputation using Generative Adversarial Nets , 2018, ICML.