Classifying patterns with missing values using Multi-Task Learning perceptrons

Datasets with missing values are frequent in real-world classification problems. It seems obvious that imputation of missing values can be considered as a series of secondary tasks, while classification is the main purpose of any machine dealing with these datasets. Consequently, Multi-Task Learning (MTL) schemes offer an interesting alternative approach to solve missing data problems. In this paper, we propose an MTL-based method for training and operating a modified Multi-Layer Perceptron (MLP) architecture to work in incomplete data contexts. The proposed approach achieves a balance between both classification and imputation by exploiting the advantages of MTL. Extensive experimental comparisons with well-known imputation algorithms show that this approach provides excellent results. The method is never worse than the traditional algorithms - an important robustness property - and, also, it clearly outperforms them in several problems.

[1]  W. Härdle Nonparametric and Semiparametric Models , 2004 .

[2]  Chee Peng Lim,et al.  A Hybrid Neural Network System for Pattern Classification Tasks with Missing Features , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Soo-Young Lee,et al.  Training Algorithm with Incomplete Data for Feed-Forward Neural Networks , 1999, Neural Processing Letters.

[4]  Michel Verleysen,et al.  K nearest neighbours with mutual information for simultaneous classification and missing data imputation , 2009, Neurocomputing.

[5]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[6]  C. Ji,et al.  Measurement-based network monitoring: missing data formulation and scalability analysis , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[7]  Mia K. Markey,et al.  Impact of missing data in training artificial neural networks for computer-aided diagnosis , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[8]  Esther-Lydia Silva-Ramírez,et al.  Missing value imputation on missing completely at random data using multilayer perceptrons , 2011, Neural Networks.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Gregoire Mercier,et al.  Self-Organizing Maps for Processing of Data with Missing Values and Outliers: Application to Remote Sensing Images , 2010 .

[11]  Amit Gupta,et al.  Estimating Missing Values Using Neural Networks , 1996 .

[12]  Tariq Samad,et al.  Imputation of Missing Data in Industrial Databases , 1999, Applied Intelligence.

[13]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[14]  Aníbal R. Figueiras-Vidal,et al.  Multi-task Neural Networks for Dealing with Missing Inputs , 2007, IWINAC.

[15]  Yoshua Bengio,et al.  Bias learning, knowledge sharing , 2003, IEEE Trans. Neural Networks.

[16]  Aníbal R. Figueiras-Vidal,et al.  Combining Missing Data Imputation and Pattern Classification in a Multi-Layer Perceptron , 2009, Intell. Autom. Soft Comput..

[17]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[18]  Leonardo Franco,et al.  Missing data imputation using statistical and machine learning methods in a real breast cancer problem , 2010, Artif. Intell. Medicine.

[19]  Aníbal R. Figueiras-Vidal,et al.  Pattern Classification with Missing Values using Multitask Learning , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[20]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[21]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[22]  Lutz Prechelt,et al.  Automatic early stopping using cross validation: quantifying the criteria , 1998, Neural Networks.

[23]  Vadlamani Ravi,et al.  Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts , 2012, Expert Syst. Appl..

[24]  Sophie Midenet,et al.  Self-Organising Map for Data Imputation and Correction in Surveys , 2002, Neural Computing & Applications.

[25]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[26]  Foster J. Provost,et al.  Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..

[27]  Kuldeep Kumar,et al.  Robust Statistics, 2nd edn , 2011 .

[28]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[29]  Michel Verleysen,et al.  Assessment of probability density estimation methods: Parzen window and finite Gaussian mixtures , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[30]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[31]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[32]  Jesús Cid-Sueiro,et al.  Cost functions to estimate a posteriori probabilities in multiclass problems , 1999, IEEE Trans. Neural Networks.

[33]  Marco Di Zio,et al.  Imputation through finite Gaussian mixture models , 2007, Comput. Stat. Data Anal..

[34]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Robert E. Mercer,et al.  Selective transfer of neural network task knowledge , 2000 .

[36]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[37]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[38]  M. F. Møller,et al.  Efficient Training of Feed-Forward Neural Networks , 1993 .

[39]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[40]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Sankar K. Pal,et al.  Fuzzy sets and decisionmaking approaches in vowel and speaker recognition , 1977 .

[42]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .