Self-Organising Map for Data Imputation and Correction in Surveys

This paper is dedicated to erroneous data detection and imputation methods in surveys. We describe experiments conducted under the scope of a European project for studying new statistical methods based on neural networks. We show that the self-organising map can be used successfully for these tasks. A self-organising map is calibrated according to the available observations, described through a set of correlated variables handled together. The map can then be used both to detect erroneous data and to impute values to partial observations. We apply these principles to a real size transport survey database. We show that the performance of our imputation model compares well to other classical methods, and that the use of a self-organising map for data correction provides a performing system fordata validation, data correction and data analysis.

[1]  Volker Tresp,et al.  Training Neural Networks with Deficient Data , 1993, NIPS.

[2]  Yoshua Bengio,et al.  Recurrent Neural Networks for Missing or Asynchronous Data , 1995, NIPS.

[3]  Klaus Schulten,et al.  Topology-conserving maps for learning visuo-motor-coordination , 1989, Neural Networks.

[4]  Peter Vamplew,et al.  Techniques for Dealing with Missing Values in Feedforward Networks , 1996 .

[5]  Sankar K. Pal,et al.  Fuzzy multi-layer perceptron, inferencing and rule generation , 1995, IEEE Trans. Neural Networks.

[6]  Fionn Murtagh,et al.  Data Imputation and Nowcasting in the Environmental Sciences Using Clustering and Connectionist Modelling , 1998, COMPSTAT.

[7]  Smaïl Ibbou Classification, analyse des correspondances et methodes neuronales , 1998 .

[8]  Mariusz Grabowski Application of Self-Organizing Maps to Outlier Identification and Estimation of Missing Data , 1998 .

[9]  C. S. Cox,et al.  COMPARISON OF AUTOASSOCIATIVE NEURAL NETWORKS AND KOHONEN MAPS FOR SIGNAL FAILURE DETECTION AND RECONSTRUCTION , 1999 .

[10]  Vincent Lorquet Etude d'un codage semi-distribué adaptatif pour les réseaux multi-couches. Application au diagnostic, à la modélisation et à la commande , 1992 .

[11]  S. Nordbotten Neural network imputation applied to the Norwegian 1990 population census data , 1996 .

[12]  Carlos López-Vázquez Application of ANN to the prediction of missing daily precipitation records, and comparison against linear methodologies 1 , 1997 .

[13]  Tariq Samad,et al.  Self–organization with partial data , 1992 .

[14]  Sophie Midenet,et al.  Learning Associations by Self-Organization: The LASSO model , 1994, Neurocomputing.

[15]  Yoshua Bengio,et al.  Missing Data with Recurrent Networks Handling Asynchronous or Missing Data with Recurrent Networks , 1998 .

[16]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[17]  Y. Idan,et al.  Handwritten digits recognition by a supervised Kohonen-like learning algorithm , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.