Towards improving cluster-based feature selection with a simplified silhouette filter

This paper proposes a filter-based algorithm for feature selection. The filter is based on the partitioning of the set of features into clusters. The number of clusters, and consequently the cardinality of the subset of selected features, is automatically estimated from data. The computational complexity of the proposed algorithm is also investigated. A variant of this filter that considers feature-class correlations is also proposed for classification problems. Empirical results involving ten datasets illustrate the performance of the developed algorithm, which in general has obtained competitive results in terms of classification accuracy when compared to state of the art algorithms that find clusters of features. We show that, if computational efficiency is an important issue, then the proposed filter may be preferred over their counterparts, thus becoming eligible to join a pool of feature selection algorithms to be used in practice. As an additional contribution of this work, a theoretical framework is used to formally analyze some properties of feature selection methods that rely on finding clusters of features.

[1]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[2]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[3]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[4]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Ricardo J. G. B. Campello,et al.  Evolving clusters in gene-expression data , 2006, Inf. Sci..

[7]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[8]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[10]  Leandro Nunes de Castro,et al.  A Cluster-Based Feature Selection Approach , 2009, HAIS.

[11]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[12]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[13]  Ian Witten,et al.  Data Mining , 2000 .

[14]  Roger E Bumgarner,et al.  Clustering gene-expression data with repeated measurements , 2003, Genome Biology.

[15]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[16]  Lior Rokach,et al.  The CASH algorithm-cost-sensitive attribute selection using histograms , 2013, Inf. Sci..

[17]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[18]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[20]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[21]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[22]  Benjamin King Step-Wise Clustering Procedures , 1967 .

[23]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[25]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[26]  Richard Weber,et al.  Simultaneous feature selection and classification using kernel-penalized support vector machines , 2011, Inf. Sci..

[27]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[28]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[29]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[30]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[31]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[32]  Ricardo J. G. B. Campello,et al.  Evolutionary algorithms for clustering gene-expression data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[33]  Geoffrey I. Webb,et al.  Proportional k-Interval Discretization for Naive-Bayes Classifiers , 2001, ECML.

[34]  Miguel Cazorla,et al.  Feature selection, mutual information, and the classification of high-dimensional patterns , 2008, Pattern Analysis and Applications.

[35]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[36]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[37]  Adam C. Winstanley,et al.  Invariant optimal feature selection: A distance discriminant and feature ranking based solution , 2008, Pattern Recognit..

[38]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[39]  G. De Soete,et al.  Clustering and Classification , 2019, Data-Driven Science and Engineering.

[40]  Loris Nanni,et al.  Cluster-based pattern discrimination: A novel technique for feature selection , 2006, Pattern Recognit. Lett..

[41]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[42]  Phipps Arabie,et al.  AN OVERVIEW OF COMBINATORIAL DATA ANALYSIS , 1996 .

[43]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[44]  Luc De Raedt,et al.  Proceedings of the 12th European Conference on Machine Learning , 2001 .

[45]  A.K.C. Wong,et al.  Attribute clustering for grouping, selection, and classification of gene expression data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[46]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[47]  Ricardo J. G. B. Campello,et al.  On comparing two sequences of numbers and its applications to clustering analysis , 2009, Inf. Sci..

[48]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.