Imputing manufacturing material in data mining

Data plays a vital role as a source of information to organizations, especially in times of information and technology. One encounters a not-so-perfect database from which data is missing, and the results obtained from such a database may provide biased or misleading solutions. Therefore, imputing missing data to a database has been regarded as one of the major steps in data mining. The present research used different methods of data mining to construct imputative models in accordance with different types of missing data. When the missing data is continuous, regression models and Neural Networks are used to build imputative models. For the categorical missing data, the logistic regression model, neural network, C5.0 and CART are employed to construct imputative models. The results showed that the regression model was found to provide the best estimate of continuous missing data; but for categorical missing data, the C5.0 model proved the best method.

[1]  G. Kalton,et al.  The treatment of missing survey data , 1986 .

[2]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[3]  Ingram Olkin,et al.  Incomplete data in sample surveys. Vol. 2: theory and bibliographies , 1983 .

[4]  A. Agresti An introduction to categorical data analysis , 1997 .

[5]  F. S. P. Szuster,et al.  Nonsampling Error in Surveys , 1994 .

[6]  G. Kalton IMPUTING FOR MISSING SURVEY RESPONSES , 2002 .

[7]  Seymour Sudman,et al.  Nonsampling Error in Surveys , 1993 .

[8]  Jing-Rong Li,et al.  RMINE: A Rough Set Based Data Mining Prototype for the Reasoning of Incomplete Data in Condition-based Fault Diagnosis , 2006, J. Intell. Manuf..

[9]  Bruce Thompson,et al.  Advances in Social Science Methodology , 1994 .

[10]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[11]  Judi Scheffer,et al.  Dealing with Missing Data , 2020, The Big R‐Book.

[12]  R. Fay Alternative Paradigms for the Analysis of Imputed Survey Data , 1996 .

[13]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[14]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[15]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[16]  Joop J. Hox,et al.  A review of current software for handling missing data , 1999 .

[17]  Michael P. Craven A faster learning neural network classifier using selective backpropagation , 1997 .

[18]  Ingram Olkin,et al.  Incomplete data in sample surveys , 1985 .

[19]  Laverne W. Stanton,et al.  Applied Regression Analysis: A Research Tool , 1990 .

[20]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[21]  J. O. Rawlings,et al.  Applied Regression Analysis: A Research Tool , 1988 .

[22]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .