Takagi-Sugeno Modeling of Incomplete Data for Missing Value Imputation With the Use of Alternate Learning

Missing values often occur in real-world datasets, which undermines the data integrity and reduces the reliability of data mining. In this paper, a method of Takagi-Sugeno (TS) fuzzy modeling for incomplete data is proposed and utilized to estimate missing values. Considering the difference of attribute relationship within different clusters, this method performs regression analysis on the subsets obtained by fuzzy clustering and constructs the global model with the weighted sum of regression models, which describes the relationship between attributes more precisely on the basis of traditional regression imputation. Meanwhile, focusing on the problem of incomplete model input caused by missing values, we propose an alternate learning strategy to train model parameters and imputations, which treats missing values as variables to drive the advance of incomplete data modeling and updates imputations with the adjustment of model parameters. Through the alternate learning strategy, not only the problem of incomplete model input is well solved, but also the accuracy of the model and the performance of imputation are improved together in a collaborative way. Experimental results on several UCI datasets with different missing ratios and missing data mechanisms demonstrate the effectiveness of the proposed method and strategy.

[1]  Leonardo Franco,et al.  Missing data imputation using statistical and machine learning methods in a real breast cancer problem , 2010, Artif. Intell. Medicine.

[2]  Manar D. Samad,et al.  Non-linear regression models for imputing longitudinal missing data , 2019, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[3]  K. Lavanya,et al.  A Study of High-Dimensional Data Imputation Using Additive LASSO Regression Model , 2018, Advances in Intelligent Systems and Computing.

[4]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[5]  R. Rovatti,et al.  On the approximation capabilities of the homogeneous Takagi-Sugeno model , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[6]  Jiye Liang,et al.  An efficient instance selection algorithm for k nearest neighbor regression , 2017, Neurocomputing.

[7]  Agustín Jiménez,et al.  A new approach to fuzzy estimation of Takagi-Sugeno model and its applications to optimal control for nonlinear systems , 2012, Appl. Soft Comput..

[8]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Guy N. Brock,et al.  BayesMetab: treatment of missing values in metabolomic studies using a Bayesian modeling approach , 2019, BMC Bioinformatics.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Francesco Contino,et al.  A robust and efficient stepwise regression method for building sparse polynomial chaos expansions , 2017, J. Comput. Phys..

[13]  Lance D. Chambers The Practical Handbook of Genetic Algorithms: Applications, Second Edition , 2000 .

[14]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[15]  Yingfeng Cai,et al.  Ensemble correlation-based low-rank matrix completion with applications to traffic data imputation , 2017, Knowl. Based Syst..

[16]  Jianmin Wang,et al.  Enriching Data Imputation under Similarity Rule Constraints , 2020, IEEE Transactions on Knowledge and Data Engineering.

[17]  Chih-Fong Tsai,et al.  A class center based approach for missing value imputation , 2018, Knowl. Based Syst..

[18]  João Miguel da Costa Sousa,et al.  Missing data in medical databases: Impute, delete or classify? , 2013, Artif. Intell. Medicine.

[19]  Ye Yang,et al.  Using Bayesian regression and EM algorithm with missing handling for software effort prediction , 2015, Inf. Softw. Technol..

[20]  Wojtek Kowalczyk,et al.  An Incremental Algorithm for Repairing Training Sets with Missing Values , 2016, IPMU.

[21]  Saeid Rastegar,et al.  Online identification of Takagi–Sugeno fuzzy models based on self-adaptive hierarchical particle swarm optimization algorithm , 2017 .

[22]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[23]  Lin Lin,et al.  Process Takagi-Sugeno model: A novel approach for handling continuous input and output functions and its application to time series prediction , 2014, Knowl. Based Syst..

[24]  Negin Daneshpour,et al.  Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model , 2019, Expert Syst. Appl..

[25]  Chih-Fong Tsai,et al.  Combining instance selection for better missing value imputation , 2016, J. Syst. Softw..

[26]  Uzay Kaymak,et al.  A new approach to dealing with missing values in data-driven fuzzy modeling , 2010, International Conference on Fuzzy Systems.

[27]  Wan-Chi Siu,et al.  Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data , 2012, Pattern Recognit..

[28]  Ahmet Arslan,et al.  A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm , 2013, Inf. Sci..

[29]  Michela Antonelli,et al.  On the influence of feature selection in fuzzy rule-based regression model generation , 2016, Inf. Sci..

[30]  Md Zahidul Islam,et al.  Missing value imputation using a fuzzy clustering-based EM approach , 2015, Knowledge and Information Systems.

[31]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[32]  Jianzhong Li,et al.  FROG: Inference from knowledge base for missing value imputation , 2018, Knowl. Based Syst..

[33]  Tharam S. Dillon,et al.  A Stepwise-Based Fuzzy Regression Procedure for Developing Customer Preference Models in New Product Development , 2015, IEEE Transactions on Fuzzy Systems.

[34]  Shichao Zhang,et al.  Shell-neighbor method and its application in missing data imputation , 2011, Applied Intelligence.