Unsupervised Feature Ranking and Selection

Dimensionality reduction is an important issue for efficient handling of large data sets. Feature selection is effective in dimensionality reduction. Many supervised feature selection methods exist. Little work has been done for unsupervised feature ranking and selection where class information is not available. In this chapter, we are concerned with the problem of determining and choosing the important original features for unsupervised data. Our method is based on the observation that removing an irrelevant feature may not change the underlying concept of the data, but not so otherwise. We propose an entropy measure for ranking features, and conduct experiments to verify that the proposed method is able to find important features. For verification purpose, we compare it with a feature ranking method (Relief) that requires class information, and test the reduced data for tasks of clustering and model construction. This work can also be extended to dimensionality reduction for data with continuous class.

[1]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Yi Zhang,et al.  Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.

[4]  Paul S. Bradley,et al.  Clustering via Concave Minimization , 1996, NIPS.

[5]  I. Jolliffe Principal Component Analysis , 2002 .

[6]  Mark A. Gluck,et al.  Information, Uncertainty and the Utility of Categories , 1985 .

[7]  J. D. Fast,et al.  Entropy: The significance of the concept of entropy and its applications in science and technology , 1962 .

[8]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[9]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[10]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[11]  J. D. Fast The Statistical Significance of the Entropy Concept , 1968 .

[12]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[13]  Manoranjan Dash,et al.  Entropy-based fuzzy clustering and fuzzy modeling , 2000, Fuzzy Sets Syst..

[14]  LebowitzMichael Experiments with Incremental Concept Formation , 1987 .

[15]  Céline Rouveirol,et al.  Machine Learning: ECML-98 , 1998, Lecture Notes in Computer Science.

[16]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[17]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[18]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[19]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[20]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[21]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[22]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[23]  A. K. Jain,et al.  A critical evaluation of intrinsic dimensionality algorithms. , 1980 .

[24]  Michio Sugeno,et al.  A fuzzy-logic-based approach to qualitative modeling , 1993, IEEE Trans. Fuzzy Syst..

[25]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[28]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[29]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[30]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[31]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[32]  S. Chiu Method and software for extracting fuzzy classification rules by subtractive clustering , 1996, Proceedings of North American Fuzzy Information Processing.

[33]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[34]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .