Missing data imputation through GTM as a mixture of t-distributions

The Generative Topographic Mapping (GTM) was originally conceived as a probabilistic alternative to the well-known, neural network-inspired, Self-Organizing Maps. The GTM can also be interpreted as a constrained mixture of distribution models. In recent years, much attention has been directed towards Student t-distributions as an alternative to Gaussians in mixture models due to their robustness towards outliers. In this paper, the GTM is redefined as a constrained mixture of t-distributions: the t-GTM, and the Expectation-Maximization algorithm that is used to fit the model to the data is modified to carry out missing data imputation. Several experiments show that the t-GTM successfully detects outliers, while minimizing their impact on the estimation of the model parameters. It is also shown that the t-GTM provides an overall more accurate imputation of missing values than the standard Gaussian GTM.

[1]  M. Wedel,et al.  Market Segmentation: Conceptual and Methodological Foundations , 1997 .

[2]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[3]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[4]  Franklyn A Howe,et al.  1H MR spectroscopy of brain tumours and masses , 2003, NMR in biomedicine.

[5]  D. Louis Collins,et al.  Accurate, noninvasive diagnosis of human brain tumors by using proton magnetic resonance spectroscopy , 1996, Nature Medicine.

[6]  Geoffrey E. Hinton,et al.  GTM through time , 1997 .

[7]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[8]  N. Campbell,et al.  A multivariate study of variation in two species of rock crab of the genus Leptograpsus , 1974 .

[9]  Manuel Castejón Limas,et al.  Outlier Detection and Data Cleaning in Multivariate Non-Normal Samples: The PAELLA Algorithm , 2004, Data Mining and Knowledge Discovery.

[10]  Shy Shoham,et al.  Robust clustering by deterministic agglomeration EM of mixtures of multivariate t-distributions , 2002, Pattern Recognit..

[11]  Vojtech Franc,et al.  Robust subspace mixture models using t-distributions , 2003, BMVC.

[12]  Dankmar Böhning,et al.  Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Disease Mapping, and Others , 1999 .

[13]  Alan Olinsky,et al.  The comparative efficacy of imputation methods for missing data in structural equation modeling , 2003, Eur. J. Oper. Res..

[14]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[15]  Cajo J. F. ter Braak,et al.  Bayesian model-based cluster analysis for predicting macrofaunal communities , 2003 .

[16]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[17]  R. Baierlein Probability Theory: The Logic of Science , 2004 .

[18]  Changshui Zhang,et al.  Competitive EM algorithm for finite mixture models , 2004, Pattern Recognit..

[19]  Bin Luo,et al.  Robust mixture modelling using multivariate , 2004, Pattern Recognit. Lett..

[20]  Michel Verleysen,et al.  Flexible and Robust Bayesian Classification by Finite Mixture Models , 2004, ESANN.

[21]  G. J. M La,et al.  ON COMPUTATIONAL ASPECTS OF CLUSTERING VIA MIXTURES OF NORMAL AND t-COMPONENTS , 1981 .

[22]  Yi Sun,et al.  GTM-based data visualisation with incomplete data , 2001 .

[23]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[24]  Wilfried Seidel,et al.  Editorial: recent developments in mixture models , 2003, Comput. Stat. Data Anal..

[25]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[26]  E. M. Carter,et al.  High breakdown mixture discriminant analysis , 2005 .

[27]  Paulo J. G. Lisboa,et al.  Robust analysis of MRS brain tumour data using t-GTM , 2006, Neurocomputing.

[28]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[29]  Dan Cornford,et al.  Outlier detection in scatterometer data: neural network approaches , 2003, Neural Networks.

[30]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[31]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[32]  W El-Deredy,et al.  Tumour grading from magnetic resonance spectroscopy: a comparison of feature extraction with variable selection , 2003, Statistics in medicine.

[33]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[34]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[35]  Christopher M. Bishop,et al.  Robust Bayesian Mixture Modelling , 2005, ESANN.

[36]  Geoffrey J. McLachlan,et al.  Robust Cluster Analysis via Mixtures of Multivariate t-Distributions , 1998, SSPR/SPR.

[37]  Andy H. Lee,et al.  Finite mixture regression model with random effects: application to neonatal hospital length of stay , 2003, Comput. Stat. Data Anal..

[38]  Ignasi Rodríguez-Roda,et al.  Exploration Of The Ecological Status OfMediterranean Rivers: Clustering,Visualizing And Reconstructing Streams DataUsing Generative Topographic Mapping , 2004 .

[39]  Naonori Ueda,et al.  Bayesian model search for mixture models based on optimizing variational bounds , 2002, Neural Networks.

[40]  Christopher M. Bishop,et al.  Developments of the generative topographic mapping , 1998, Neurocomputing.

[41]  Mark Last Automated Detection of Outliers in Real-World Data , 2001 .

[42]  Eliseo P. Vergara Outlier Detection and Data Cleaning in Multivariate Non-Normal Samples: The PAELLA Algorithm ∗ , 2004 .

[43]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[44]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[45]  Paulo J. G. Lisboa,et al.  Selective smoothing of the generative topographic mapping , 2003, IEEE Trans. Neural Networks.

[46]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[47]  Peter Tiño,et al.  Hierarchical GTM: Constructing Localized Nonlinear Projection Manifolds in a Principled Way , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Mark A. Girolami Latent variable models for the topographic organisation of discrete and strictly positive data , 2002, Neurocomputing.

[49]  Miguel Á. Carreira-Perpiñán,et al.  Reconstruction of Sequential Data with Probabilistic Models and Continuity Constraints , 1999, NIPS.

[50]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[51]  S. Dibb Market Segmentation: Conceptual and Methodological Foundations (2nd edition) , 2000 .