Granular data imputation: A framework of Granular Computing

Display Omitted Data imputation is realized with the use of fuzzy clustering.Data imputation leads to information granules rather than numeric entries.Information granules help quantify the quality of imputation.The use of granular imputation is shown in system modeling. Data imputation is a common practice encountered when dealing with incomplete data. Irrespectively of the existing spectrum of techniques, the results of imputation are commonly numeric meaning that once the data have been imputed they are not distinguishable from the original data being initially available prior to imputation. In this study, the crux of the proposed approach is to develop a way of representing imputed (missing) entries as information granules and in this manner quantify the quality of the imputation process and the quality of the ensuing data. We establish a two-stage imputation mechanism in which we start with any method of numeric imputation and then form a granular representative of missing value. In this sense, the approach could be regarded as an enhancement of the existing imputation techniques.Proceeding with the detailed imputation schemes, we discuss two ways of imputation. In the first one, imputation is realized for individual variables of data sets and afterwards enhanced by the buildup of information granules. In the second approach, we are concerned with the use of fuzzy clustering, Fuzzy C-Means (FCM), which helps establish a structure in the data and then use this information in the imputation process.The design of information granules invokes the fundamentals of Granular Computing, namely a principle of justifiable granularity and an allocation of information granularity. Numeric experiments concerned with a suite of publicly available data sets offer detailed insights into the main facets of the overall design process and deliver a parametric analysis of the methods.

[1]  Paul Zhang Multiple Imputation: Theory and Method , 2003 .

[2]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[3]  Vadlamani Ravi,et al.  A new online data imputation method based on general regression auto associative neural network , 2014, Neurocomputing.

[4]  Md Zahidul Islam,et al.  Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques , 2013, Knowl. Based Syst..

[5]  Lawrence Carin,et al.  On Classification with Incomplete Data , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ahmet Arslan,et al.  A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm , 2013, Inf. Sci..

[7]  Witold Pedrycz,et al.  Granular Computing: Perspectives and Challenges , 2013, IEEE Transactions on Cybernetics.

[8]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[9]  Taghi M. Khoshgoftaar,et al.  Incomplete-Case Nearest Neighbor Imputation in Software Measurement Data , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[10]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[11]  Hidetomo Ichihashi,et al.  Fuzzy c-Means Classifier with Deterministic Initialization and Missing Value Imputation , 2007, 2007 IEEE Symposium on Foundations of Computational Intelligence.

[12]  Li Li,et al.  Missing traffic data: comparison of imputation methods , 2014 .

[13]  Witold Pedrycz,et al.  Allocation of information granularity in optimization and decision-making models: Towards building the foundations of Granular Computing , 2014, Eur. J. Oper. Res..

[14]  Witold Pedrycz,et al.  Granular Computing: Analysis and Design of Intelligent Systems , 2013 .

[15]  Heiko Timm,et al.  Different approaches to fuzzy clustering of incomplete datasets , 2004, Int. J. Approx. Reason..

[16]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[17]  Witold Pedrycz,et al.  A Novel Framework for Imputation of Missing Values in Databases , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[18]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[19]  Steven D. Brown,et al.  Comparison of five iterative imputation methods for multivariate classification , 2013 .

[20]  Witold Pedrycz,et al.  From logic descriptors to granular logic descriptors: a study in allocation of information granularity , 2012, Journal of Ambient Intelligence and Humanized Computing.

[21]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .

[22]  Xiaofeng Zhu,et al.  Missing data imputation by utilizing information within incomplete instances , 2011, J. Syst. Softw..

[23]  M. P. Gómez-Carracedo,et al.  A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets , 2014 .

[24]  Alessandro G. Di Nuovo,et al.  Missing data analysis with fuzzy C-Means: A study of its application in a psychological scenario , 2011, Expert Syst. Appl..

[25]  Witold Pedrycz,et al.  A Development of Fuzzy Encoding and Decoding Through Fuzzy Clustering , 2008, IEEE Transactions on Instrumentation and Measurement.

[26]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[27]  Sabine Verboven,et al.  Robust data imputation , 2009, Comput. Biol. Chem..

[28]  Witold Pedrycz,et al.  Building the fundamentals of granular computing: A principle of justifiable granularity , 2013, Appl. Soft Comput..

[29]  Hong Gu,et al.  A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data , 2010, Expert Syst. Appl..

[30]  W. Peizhuang Pattern Recognition with Fuzzy Objective Function Algorithms (James C. Bezdek) , 1983 .

[31]  Jun Zhang,et al.  Lazy Collaborative Filtering for Data Sets With Missing Values , 2013, IEEE Transactions on Cybernetics.

[32]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .