Empirical evaluation of algorithms to impute missing values for financial dataset

While mining the data of investment in different financial instruments, we encounter with the problem of incomplete data. In order to have more efficient analysis and results, there is a need to calculate missing values in data. Various approaches for missing value imputation have been proposed and compared in the literature. But to the best of my knowledge work reported here on performance analysis of K-means, Fuzzy K-means and Weighted K-means to compute missing values has yet not been done using financial dataset. This paper analyzes the performance of these three algorithms to find incomplete values of missing factors. Root mean square error is used as an evaluation criterion for the comparison for three mentioned algorithms. Computation is done on the data of investment patterns in different financial instruments. Results show that K-Means algorithm suite the financial data best for incomplete values imputation in comparison to other variants.

[1]  Xiao-Li Meng,et al.  Applications of multiple imputation in medical studies: from AIDS to NHANES , 1999, Statistical methods in medical research.

[2]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[3]  Francisco Herrera,et al.  On the choice of the best imputation methods for missing values considering three groups of classification methods , 2012, Knowledge and Information Systems.

[4]  Jitender S. Deogun,et al.  Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method , 2004, Rough Sets and Current Trends in Computing.

[5]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[6]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[7]  Durga Toshniwal,et al.  Missing Value Imputation Based on K-Mean Clustering with Weighted Distance , 2010, IC3.

[8]  Esther-Lydia Silva-Ramírez,et al.  Missing value imputation on missing completely at random data using multilayer perceptrons , 2011, Neural Networks.

[9]  Leslie S. Smith,et al.  A neural network-based framework for the reconstruction of incomplete data sets , 2010, Neurocomputing.

[10]  Jianhui Ning,et al.  A comparison study of nonparametric imputation methods , 2012, Stat. Comput..

[11]  Francisco Herrera,et al.  Missing data imputation for fuzzy rule-based classification systems , 2012, Soft Comput..

[12]  Yuchi Kanzawa,et al.  Fuzzy c-Means Clustering for Uncertain Data Using Quadratic Penalty-Vector Regularization , 2011, J. Adv. Comput. Intell. Intell. Informatics.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Hongjun Lu,et al.  DIRECT: a system for mining data value conversion rules from disparate data sources , 2002, Decis. Support Syst..

[15]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[16]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[17]  Wang Fang-xiao Research on Domain-independent Data Cleaning:A Survey , 2010 .

[18]  Jitender S. Deogun,et al.  Dealing with Missing Data: Algorithms Based on Fuzzy Set and Rough Set Theories , 2005, Trans. Rough Sets.

[19]  Jeffrey S. Simonoff,et al.  An Investigation of Missing Data Methods for Classification Trees , 2006, J. Mach. Learn. Res..

[20]  Halima Bensmail,et al.  Analyzing Imputed Financial Data: A New Approach to Cluster Analysis , 2004 .

[21]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[22]  Ingunn Myrtveit,et al.  Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods , 2001, IEEE Trans. Software Eng..

[23]  Wolfgang Gaul,et al.  "Classification, Clustering, and Data Mining Applications" , 2004 .

[24]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..