A new iterative fuzzy clustering algorithm for multiple imputation of missing data

This paper proposes a new iterative fuzzy clustering (IFC) algorithm to impute missing values of datasets. The information provided by fuzzy clustering is used to update the imputed values through iterations. The performance of the IFC algorithm is examined by conducting experiments on three commonly used datasets and a case study on a city mobility database. Experimental results show that the IFC algorithm not only works well for datasets with a small number of missing values but also provides an effective imputation result for datasets where the proportion of missing data is high.

[1]  Daniel Albalate,et al.  What shapes local public transportation in Europe? Economics, Mobility, Institutions, and Geography , 2010 .

[2]  Paolo Giordani,et al.  A toolbox for fuzzy clustering using the R programming language , 2015, Fuzzy Sets Syst..

[3]  Bing Yu,et al.  Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering , 2013, Applied Intelligence.

[4]  Ingunn Myrtveit,et al.  Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods , 2001, IEEE Trans. Software Eng..

[5]  G. Bel,et al.  Tourism and urban public transport: Holding demand pressure under supply constraints , 2010 .

[6]  Mohammad Ataei,et al.  Performance prediction of circular saw machine using imperialist competitive algorithm and fuzzy clustering technique , 2018, Neural Computing and Applications.

[7]  János Abonyi,et al.  Aggregation and Visualization of Fuzzy Clusters Based on Fuzzy Similarity Measures , 2007 .

[8]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[9]  James C. Bezdek,et al.  Visual cluster validity (VCV) displays for prototype generator clustering methods , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[10]  Jitender S. Deogun,et al.  Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method , 2004, Rough Sets and Current Trends in Computing.

[11]  Tze-Yun Leong,et al.  Fuzzy K-means clustering with missing values , 2001, AMIA.

[12]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the silhouette width criterion for cluster analysis , 2006, Fuzzy Sets Syst..

[13]  Lakhmi C. Jain,et al.  Innovations in Fuzzy Clustering - Theory and Applications , 2006, Studies in Fuzziness and Soft Computing.

[14]  Carol M Musil,et al.  A Comparison of Imputation Techniques for Handling Missing Data , 2002, Western journal of nursing research.

[15]  Subhagata Chattopadhyay,et al.  Comparing Fuzzy-C Means and K-Means Clustering Techniques: A Comprehensive Study , 2012 .

[16]  Francisco Herrera,et al.  On the choice of the best imputation methods for missing values considering three groups of classification methods , 2012, Knowledge and Information Systems.