A SVM Regression Based Approach to Filling in Missing Values

In KDD procedure, to fill in missing data typically requires a very large investment of time and energy – often 80% to 90% of a data analysis project is spent in making the data reliable enough so that the results can be trustful. In this paper, we propose a SVM regression based algorithm for filling in missing data, i.e. set the decision attribute (output attribute) as the condition attribute (input attribute) and the condition attribute as the decision attribute, then use SVM regression to predict the condition attribute values. SARS data set experimental results show that SVM regression method has the highest precision. The method with which the value of the example that has the minimum distance to the example with missing value will be taken to fill in the missing values takes the second place, and the mean and median methods have lower precision.

[1]  T M Thomas,et al.  Prevalence of urinary incontinence. , 1980, British medical journal.

[2]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[3]  Th. Liehr,et al.  Data Preparation in Large Real-World Data Mining Projects: Methods for Imputing Missing Values , 2003 .

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Shin Ishii,et al.  Missing Value Estimation Using Mixture of PCAs , 2002, ICANN.

[6]  Otto Opitz,et al.  Exploratory Data Analysis in Empirical Research , 2002 .

[7]  I. Milsom,et al.  The prevalence of urinary incontinence , 2000, Acta obstetricia et gynecologica Scandinavica.

[8]  Estevam R. Hruschka,et al.  Evaluating a Nearest-Neighbor Method to Substitute Continuous Missing Values , 2003, Australian Conference on Artificial Intelligence.

[9]  Mary Ann. Hill,et al.  Spss Missing Value Analysis 7.5 , 1997 .

[10]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[11]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Several Approaches to Missing Attribute Values in Data Mining , 2000, Rough Sets and Current Trends in Computing.

[12]  H. Toutenburg Little, R.J.A. and D.B. Rubin:Statistical analysis with missing data , 1991 .

[13]  Qinbao Song,et al.  Dealing with missing software project data , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[14]  Jau-Ji Shen,et al.  A recycle technique of association rule for missing value completion , 2003, 17th International Conference on Advanced Information Networking and Applications, 2003. AINA 2003..