The effect of noise and sample size on an unsupervised feature selection method for manifold learning

The research on unsupervised feature selection is scarce in comparison to that for supervised models, despite the fact that this is an important issue for many clustering problems. An unsupervised feature selection method for general Finite Mixture Models was recently proposed and subsequently extended to generative topographic mapping (GTM), a manifold learning constrained mixture model that provides data visualization. Some of the results of a previous partial assessment of this unsupervised feature selection method for GTM suggested that its performance may be affected by insufficient sample size and by noisy data. In this brief study, we test in some detail such limitations of the method.

[1]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  P. Deb Finite Mixture Models , 2008 .

[3]  Christopher M. Bishop,et al.  Developments of the generative topographic mapping , 1998, Neurocomputing.

[4]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[5]  Alfredo Vellido Alcacena Preliminary theoretical results on a feature relevance determination method for Generative Topographic Mapping , 2005 .

[6]  Paulo J. G. Lisboa,et al.  Selective smoothing of the generative topographic mapping , 2003, IEEE Trans. Neural Networks.

[7]  Alfredo Vellido,et al.  Assessment of an Unsupervised Feature Selection Method for Generative Topographic Mapping , 2006, ICANN.

[8]  Paulo J. G. Lisboa,et al.  Robust analysis of MRS brain tumour data using t-GTM , 2006, Neurocomputing.

[9]  John Hallam,et al.  IEEE International Joint Conference on Neural Networks , 2005 .

[10]  Alfredo Vellido,et al.  On the benefits for model regularization of a variational formulation of GTM , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[11]  A. Vellido,et al.  Time Series Relevance Determination Through a Topology-Constrained Hidden Markov Model , 2006, IDEAL.

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.