Feature Selection Based on the Rough Set Theory and EM Clustering Algorithm

We study the Rough Set theory as a method of feature selection based on tolerant classes that extends the existing equivalent classes. The determination of initial tolerant classes is a challenging and important task for accurate feature selection and classification. In this paper the EM clustering algorithm is applied to determine similar objects. This method generates fewer features with either a higher or the same accuracy compared with two existing methods, i.e., Fuzzy Rough Feature Selection and Tolerance-based Feature Selection, on a number of benchmarks from the UCI repository.

[1]  Alan L. Yuille,et al.  Statistical Physics, Mixtures of Distributions, and the EM Algorithm , 1994, Neural Computation.

[2]  D. Vanderpooten Similarity Relation as a Basis for Rough Approximations , 1995 .

[3]  V. Bruce,et al.  Face processing: Human perception and principal components analysis , 1996, Memory & cognition.

[4]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[5]  Andrzej Skowron,et al.  Tolerance Approximation Spaces , 1996, Fundam. Informaticae.

[6]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[7]  John Porrill,et al.  Independent Components Analysis for Signal Separation and Dimension Reduction , 1997 .

[8]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[9]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[10]  Carlos Ordonez,et al.  SQLEM: fast clustering in SQL using the EM algorithm , 2000, SIGMOD '00.

[11]  Daniel Vanderpooten,et al.  A Generalized Definition of Rough Approximations Based on Similarity , 2000, IEEE Trans. Knowl. Data Eng..

[12]  Qiang Shen,et al.  Rough set-aided keyword reduction for text categorization , 2001, Appl. Artif. Intell..

[13]  Daijin Kim,et al.  Data classification based on tolerant rough set , 2001, Pattern Recognit..

[14]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[15]  Anna Maria Radzikowska,et al.  A comparative study of fuzzy rough sets , 2002, Fuzzy Sets Syst..

[16]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[17]  Sankar K. Pal,et al.  Fuzzy discretization of feature space for a rough set classifier , 2003, Pattern Recognit. Lett..

[18]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[19]  Lipo Wang,et al.  Data dimensionality reduction with application to simplifying RBF network structure and improving classification performance , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[20]  Liang Chen,et al.  A statistical method for identifying differential gene-gene co-expression patterns , 2004, Bioinform..

[21]  Qiang Shen,et al.  Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  Anna Maria Radzikowska,et al.  Fuzzy Rough Sets Based on Residuated Lattices , 2004, Trans. Rough Sets.

[23]  A. Boulesteix Statistical Applications in Genetics and Molecular Biology PLS Dimension Reduction for Classification with Microarray Data , 2011 .

[24]  A. Zhang,et al.  Feature selection for classifying high-dimensional numerical data , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[25]  Patrick Doherty,et al.  On the Correspondence between Approximations and Similarity , 2004, Rough Sets and Current Trends in Computing.

[26]  Lech Polkowski,et al.  On Rough Set Logics Based on Similarity Relations , 2005, Fundam. Informaticae.

[27]  Dimiter Vakarelov,et al.  A Modal Characterization of Indiscernibility and Similarity Relations in Pawlak's Information Systems , 2005, RSFDGrC.

[28]  Masahiro Inuiguchi,et al.  Fuzzy rough sets and multiple-premise gradual decision rules , 2006, Int. J. Approx. Reason..

[29]  Qiang Shen,et al.  Tolerance-based and Fuzzy-Rough Feature Selection , 2007, 2007 IEEE International Fuzzy Systems Conference.