On feature selection through clustering

We study an algorithm for feature selection that clusters attributes using a special metric and then makes use of the dendrogram of the resulting cluster hierarchy to choose the most relevant attributes. The main interest of our technique resides in the improved understanding of the structure of the analyzed data and of the relative importance of the attributes for the selection process.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[3]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[4]  J. Barthélemy,et al.  Remarques sur les propriétés métriques des ensembles ordonnés , 1978 .

[5]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[6]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[8]  A. Agresti An introduction to categorical data analysis , 1997 .

[9]  Ian Witten,et al.  Data Mining , 2000 .

[10]  Blaise Hanczar,et al.  Improving classification of microarray data using prototype-based feature selection , 2003, SKDD.

[11]  John Maindonald,et al.  Data Analysis and Graphics Using R: An Example-based Approach (Cambridge Series in Statistical and Probabilistic Mathematics) , 2003 .

[12]  Dan A. Simovici,et al.  Metric incremental clustering of nominal data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[13]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Jean-Pierre Barthélemy,et al.  The Median Procedure for Partitions , 1993, Partitioning Data Sets.

[16]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[18]  Bernard Monjardet,et al.  Metrics on partially ordered sets - A survey , 1981, Discret. Math..

[19]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[20]  Anil K. Jain,et al.  Algorithms for feature selection: An evaluation , 1996, Proceedings of 13th International Conference on Pattern Recognition.