Mixture of Gaussians for distance estimation with missing data

Many data sets have missing values in practical application contexts, but the majority of commonly studied machine learning methods cannot be applied directly when there are incomplete samples. However, most such methods only depend on the relative differences between samples instead of their particular values, and thus one useful approach is to directly estimate the pairwise distances between all samples in the data set. This is accomplished by fitting a Gaussian mixture model to the data, and using it to derive estimates for the distances. A variant of the model for high-dimensional data with missing values is also studied. Experimental simulations confirm that the proposed method provides accurate estimates compared to alternative methods for estimating distances. In particular, using the mixture model for estimating distances is on average more accurate than using the same model to impute any missing values and then calculating distances. The experimental evaluation additionally shows that more accurately estimating distances lead to improved prediction performance for classification and regression tasks when used as inputs for a neural network.

[1]  Taghi M. Khoshgoftaar,et al.  Incomplete-Case Nearest Neighbor Imputation in Software Measurement Data , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[2]  László Monostori,et al.  Training and Application of Artificial Neural Networks with Incomplete Data , 2002, IEA/AIE.

[3]  Michel Verleysen,et al.  Feature selection with missing data using mutual information estimators , 2012, Neurocomputing.

[4]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[5]  John K. Dixon,et al.  Pattern Recognition with Partly Missing Data , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[7]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[8]  H. Akaike A new look at the statistical model identification , 1974 .

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[11]  Hsiu J. Ho,et al.  On fast supervised learning for normal mixture models with missing information , 2006, Pattern Recognit..

[12]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[13]  Craig K. Enders,et al.  Applied Missing Data Analysis. Methodology in the Social Sciences Series. , 2010 .

[14]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[15]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[16]  Volker Tresp,et al.  Training Neural Networks with Deficient Data , 1993, NIPS.

[17]  Estevam R. Hruschka,et al.  Evaluating a Nearest-Neighbor Method to Substitute Continuous Missing Values , 2003, Australian Conference on Artificial Intelligence.

[18]  Paul W. H. Chung,et al.  Developments in Applied Artificial Intelligence , 2003, Lecture Notes in Computer Science.

[19]  C. Siew,et al.  Extreme Learning Machine with Randomly Assigned RBF Kernels , 2005 .

[20]  Alex Aussem,et al.  A Conservative Feature Subset Selection Algorithm with Missing Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[21]  Phil D. Green,et al.  Some solution to the missing feature problem in data classification, with application to noise robust ASR , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[22]  Craig K. Enders,et al.  Applied Missing Data Analysis , 2010 .

[23]  Qinyu. Zhu Extreme Learning Machine , 2013 .

[24]  Yoshua Bengio,et al.  Efficient EM Training of Gaussian Mixtures with Missing Data , 2012, ArXiv.

[25]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[26]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[27]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[28]  Jörg Drechsler,et al.  Multiple Imputation for Nonresponse , 2011 .

[29]  Geert Molenberghs,et al.  Direct likelihood analysis versus simple forms of imputation for missing data in randomized clinical trials , 2005, Clinical trials.

[30]  Lynette A. Hunt,et al.  Mixture model clustering for mixed data with missing information , 2003, Comput. Stat. Data Anal..

[31]  Alex Aussem,et al.  A conservative feature subset selection algorithm with missing data , 2010, Neurocomputing.

[32]  Adrian E Raftery,et al.  Inference from Multiple Imputation for Missing Data Using Mixtures of Normals. , 2010, Statistical methodology.

[33]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[34]  Jerzy W. Grzymala-Busse,et al.  Handling Missing Attribute Values , 2010, Data Mining and Knowledge Discovery Handbook.

[35]  Michel Verleysen,et al.  K nearest neighbours with mutual information for simultaneous classification and missing data imputation , 2009, Neurocomputing.

[37]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .