Simultaneous feature selection and clustering using mixture models

Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.

[1]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[2]  Jitender S. Deogun,et al.  Conceptual clustering in information retrieval , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  G. W. Hatfield,et al.  DNA microarrays and gene expression , 2002 .

[5]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[6]  Pedro Larrañaga,et al.  Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ashwin Ram,et al.  Efficient Feature Selection in Conceptual Clustering , 1997, ICML.

[8]  Lawrence Hubert,et al.  Advances in Cluster Analysis Relevant to Marketing Research , 1996 .

[9]  Carla E. Brodley,et al.  Feature Subset Selection and Order Identification for Unsupervised Learning , 2000, ICML.

[10]  Nando de Freitas,et al.  Bayesian Feature Weighting for Unsupervised Learning, with Application to Object Recognition , 2003, AISTATS.

[11]  Anil K. Jain,et al.  Feature Selection in Mixture-Based Clustering , 2002, NIPS.

[12]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[13]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[14]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Carla E. Brodley,et al.  Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Stephen J. Roberts,et al.  Maximum certainty data partitioning , 2000, Pattern Recognit..

[17]  Bruce A. Draper,et al.  Feature selection from huge feature sets , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[18]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[19]  Daphne Koller,et al.  Using machine learning to improve information access , 1998 .

[20]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[21]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[22]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[23]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[24]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[25]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  David G. Stork,et al.  Pattern Classification , 1973 .

[27]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[28]  Josef Kittler,et al.  Divergence Based Feature Selection for Multimodal Class Densities , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[30]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[31]  Luis Talavera,et al.  Dependency-based feature selection for clustering symbolic data , 2000, Intell. Data Anal..

[32]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[33]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[34]  P. Arabie,et al.  Cluster analysis in marketing research , 1994 .

[35]  Andrew W. Moore,et al.  Repairing Faulty Mixture Models using Density Estimation , 2001, ICML.

[36]  Shivakumar Vaithyanathan,et al.  Generalized Model Selection for Unsupervised Learning in High Dimensions , 1999, NIPS.

[37]  Takenobu Tokunaga,et al.  Cluster-based text categorization: a comparison of category search strategies , 1995, SIGIR '95.

[38]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[39]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[40]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Anil K. Jain,et al.  A Feature Selection Wrapper for Mixtures , 2003, IbPRIA.

[43]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[44]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[45]  Jagat Narain Kapur,et al.  Measures of information and their applications , 1994 .

[46]  Josef Kittler,et al.  Feature selection based on the approximation of class densities by finite mixtures of special type , 1995, Pattern Recognit..

[47]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[48]  G.B. Coleman,et al.  Image segmentation by clustering , 1979, Proceedings of the IEEE.

[49]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[50]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[52]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[53]  TalaveraLuis Dependency-based feature selection for clustering symbolic data , 2000 .

[54]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[56]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[57]  Paul S. Bradley,et al.  Clustering very large databases using EM mixture models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[58]  Boris G. Mirkin,et al.  Concept Learning and Feature Selection Based on Square-Error Clustering , 1999, Machine Learning.

[59]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  A. Atkinson Subset Selection in Regression , 1992 .

[61]  Andrzej Skowron,et al.  Rough Sets: A Tutorial , 1998 .

[62]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Gilles Celeux,et al.  A Component-Wise EM Algorithm for Mixtures , 2001, 1201.5913.

[64]  J. Carroll,et al.  A Feature-Based Approach to Market Segmentation via Overlapping K-Centroids Clustering , 1997 .