Exploration and Reduction of the Feature Space by Hierarchical Clustering

In this paper we propose and test the use of hierarchical clustering for feature selection. The clustering method is Ward’s with a distance measure based on GoodmanKruskal tau. We motivate the choice of this measure and compare it with other ones. Our hierarchical clustering is applied to over 40 data-sets from UCI archive. The proposed approach is interesting from many viewpoints. First, it produces the feature subsets dendrogram which serves as a valuable tool to study relevance relationships among features. Secondarily, the dendrogram is used in a feature selection algorithm to select the best features by a wrapper method. Experiments were run with three different families of classifiers: Naive Bayes, decision trees and k nearest neighbours. Our method allows all the three classifiers to generally outperform their corresponding ones without feature selection. We compare our feature selection with other state-of-the-art methods, obtaining on average a better classification accuracy, though obtaining a lower reduction in the number of features. Moreover, differently from other approaches for feature selection, our method does not require any parameter tuning.

[1]  David Wishart,et al.  256 NOTE: An Algorithm for Hierarchical Classifications , 1969 .

[2]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[3]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[4]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[5]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  Igor Kononenko,et al.  ReliefF for estimation and discretization of attributes in classification, regression, and ILP probl , 1996 .

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[10]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[11]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[14]  Inderjit S. Dhillon,et al.  Iterative clustering of high dimensional text data augmented by local search , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[15]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[18]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[19]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[20]  Dan A. Simovici,et al.  On feature selection through clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[21]  Xiaoming Xu,et al.  A Wrapper for Feature Selection Based on Mutual Information , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[22]  Elliot Moore,et al.  Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data , 2007 .